Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Below is an implementation using Numba in “nopython” mode. This version assumes that insertions and deletions cost 1 and that a “character swap” (i.e. substitution) costs 2 (unless the elements are equal, in which case the cost is 0): -------------------------------------------------- from numba import njit import numpy as np @njit def levenshtein(a, b): """ Compute the Levenshtein edit distance between two numpy int arrays a and b. Insertions and deletions cost 1, while substitutions (swaps) cost 2. (If a character at the same position is equal, there is no cost.) """ m = a.shape[0] n = b.shape[0] # Create the DP table. dp = np.empty((m + 1, n + 1), dtype=np.int64) # Initialize first row and first column. for i in range(m + 1): dp[i, 0] = i for j in range(n + 1): dp[0, j] = j # Fill in the DP table. for i in range(1, m + 1): for j in range(1, n + 1): if a[i - 1] == b[j - 1]: sub_cost = dp[i - 1, j - 1] else: sub_cost = dp[i - 1, j - 1] + 2 # substitution cost is 2. # Compute the minimum of delete, insert, and substitute. dp[i, j] = min(dp[i - 1, j] + 1, # deletion dp[i, j - 1] + 1, # insertion sub_cost) # substitution return dp[m, n] # Example usage: if __name__ == "__main__": # Example arrays (could represent, for example, encoded characters) x = np.array([1, 2, 3, 4], dtype=np.int64) y = np.array([1, 3, 4, 5], dtype=np.int64) dist = levenshtein(x, y) print("Levenshtein distance:", dist) -------------------------------------------------- Explanation: 1. The function levenshtein takes two numpy int arrays as input. 2. We preallocate a DP table “dp” of size (m+1)×(n+1) where m and n are the lengths of the input arrays. 3. The first row and column are filled with index values corresponding to the cost of inserting or deleting characters. 4. For each position (i, j), if the two elements match then no substitution cost is incurred; if they differ then we add a substitution cost of 2. 5. The function returns the computed distance from dp[m, n]. Compile and run this with Numba and it will execute in nopython mode.
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | from numba import njit import numpy as np @njit def levenshtein(a, b): """ Compute the Levenshtein edit distance between two numpy int arrays a and b. Insertions and deletions cost 1, while substitutions (swaps) cost 2. (If a character at the same position is equal, there is no cost.) """ m = a.shape[0] n = b.shape[0] # Create the DP table. dp = np.empty((m + 1, n + 1), dtype=np.int64) # Initialize first row and first column. for i in range(m + 1): dp[i, 0] = i for j in range(n + 1): dp[0, j] = j # Fill in the DP table. for i in range(1, m + 1): for j in range(1, n + 1): if a[i - 1] == b[j - 1]: sub_cost = dp[i - 1, j - 1] else: sub_cost = dp[i - 1, j - 1] + 2 # substitution cost is 2. # Compute the minimum of delete, insert, and substitute. dp[i, j] = min(dp[i - 1, j] + 1, # deletion dp[i, j - 1] + 1, # insertion sub_cost) # substitution return dp[m, n] |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | from numba import njit import numpy as np @njit def levenshtein(a, b): """ Compute the Levenshtein edit distance between two numpy int arrays a and b. Insertions and deletions cost 1, while substitutions (swaps) cost 2. (If a character at the same position is equal, there is no cost.) """ m = a.shape[0] n = b.shape[0] # Create the DP table. dp = np.empty((m + 1, n + 1), dtype=np.int64) # Initialize first row and first column. for i in range(m + 1): dp[i, 0] = i for j in range(n + 1): dp[0, j] = j # Fill in the DP table. for i in range(1, m + 1): for j in range(1, n + 1): if a[i - 1] == b[j - 1]: sub_cost = dp[i - 1, j - 1] else: sub_cost = dp[i - 1, j - 1] + 2 # substitution cost is 2. # Compute the minimum of delete, insert, and substitute. dp[i, j] = min(dp[i - 1, j] + 1, # deletion dp[i, j - 1] + 1, # insertion sub_cost) # substitution return dp[m, n] answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
All tests passed
Testing if the previous output contains the string All tests passed
: True
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: False
Testing if the previous output contains the string import numba
: False
Testing if the previous output contains the string from numba
: True
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Below is one solution. In this version, insertion and deletion have a cost of 1, and replacing (i.e. swapping a mismatched character) has a cost of 2. The function takes two numpy integer arrays and computes the Levenshtein distance in nopython mode using Numba. ------------------------------------------------ from numba import njit import numpy as np @njit def levenshtein(a, b): n = a.shape[0] m = b.shape[0] # Create a DP table with dimensions (n+1) x (m+1) dp = np.empty((n + 1, m + 1), dtype=np.int64) # Base cases: distance from empty sequence to a sequence of length i or j for i in range(n + 1): dp[i, 0] = i for j in range(m + 1): dp[0, j] = j # DP computation: # For each position, we take the minimum cost from: # - Deleting an element (cost 1), # - Inserting an element (cost 1), or # - Replacing (swapping) a mismatched element (cost 2, or 0 if they match) for i in range(1, n + 1): for j in range(1, m + 1): if a[i - 1] == b[j - 1]: cost = 0 else: cost = 2 deletion = dp[i - 1, j] + 1 insertion = dp[i, j - 1] + 1 substitution = dp[i - 1, j - 1] + cost dp[i, j] = deletion if deletion < insertion else insertion if substitution < dp[i, j]: dp[i, j] = substitution return dp[n, m] # Example usage: if __name__ == "__main__": # Let's compare two sequences represented as numpy int arrays. # For example, a corresponds to [1, 3, 5, 7] and b corresponds to [1, 4, 5, 9] a = np.array([1, 3, 5, 7], dtype=np.int64) b = np.array([1, 4, 5, 9], dtype=np.int64) print("Levenshtein distance:", levenshtein(a, b)) ------------------------------------------------ Explanation: 1. We use the decorator @njit to compile the function in nopython mode. 2. The DP table dp is created with shape (n+1, m+1). Each cell dp[i,j] holds the minimal cost to convert the first i elements of a into the first j elements of b. 3. Base cases: converting an empty array to a length‑j array costs j (via j insertions) and vice‑versa. 4. In the main loop, if the current elements match (i.e. a[i-1]==b[j-1]), no cost is incurred. Otherwise, substituting one for the other costs 2. 5. Finally, we return the value in dp[n, m] which is the Levenshtein distance. This implementation meets the requirements and runs fully in nopython mode with Numba.
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | from numba import njit import numpy as np @njit def levenshtein(a, b): n = a.shape[0] m = b.shape[0] # Create a DP table with dimensions (n+1) x (m+1) dp = np.empty((n + 1, m + 1), dtype=np.int64) # Base cases: distance from empty sequence to a sequence of length i or j for i in range(n + 1): dp[i, 0] = i for j in range(m + 1): dp[0, j] = j # DP computation: # For each position, we take the minimum cost from: # - Deleting an element (cost 1), # - Inserting an element (cost 1), or # - Replacing (swapping) a mismatched element (cost 2, or 0 if they match) for i in range(1, n + 1): for j in range(1, m + 1): if a[i - 1] == b[j - 1]: cost = 0 else: cost = 2 deletion = dp[i - 1, j] + 1 insertion = dp[i, j - 1] + 1 substitution = dp[i - 1, j - 1] + cost dp[i, j] = deletion if deletion < insertion else insertion if substitution < dp[i, j]: dp[i, j] = substitution return dp[n, m] |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | from numba import njit import numpy as np @njit def levenshtein(a, b): n = a.shape[0] m = b.shape[0] # Create a DP table with dimensions (n+1) x (m+1) dp = np.empty((n + 1, m + 1), dtype=np.int64) # Base cases: distance from empty sequence to a sequence of length i or j for i in range(n + 1): dp[i, 0] = i for j in range(m + 1): dp[0, j] = j # DP computation: # For each position, we take the minimum cost from: # - Deleting an element (cost 1), # - Inserting an element (cost 1), or # - Replacing (swapping) a mismatched element (cost 2, or 0 if they match) for i in range(1, n + 1): for j in range(1, m + 1): if a[i - 1] == b[j - 1]: cost = 0 else: cost = 2 deletion = dp[i - 1, j] + 1 insertion = dp[i, j - 1] + 1 substitution = dp[i - 1, j - 1] + cost dp[i, j] = deletion if deletion < insertion else insertion if substitution < dp[i, j]: dp[i, j] = substitution return dp[n, m] answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
All tests passed
Testing if the previous output contains the string All tests passed
: True
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: False
Testing if the previous output contains the string import numba
: False
Testing if the previous output contains the string from numba
: True