Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
To solve this problem, we need to compute the Levenshtein distance between two numpy integer arrays using Numba in nopython mode, where character substitutions cost 2 instead of the usual 1. The Levenshtein distance is a measure of the difference between two sequences, calculated as the minimum number of single-element edits (insertions, deletions, or substitutions) required to change one sequence into the other. ### Approach The approach uses dynamic programming (DP) to efficiently compute the Levenshtein distance. We create a 2D DP table where each cell `dp[i][j]` represents the minimum edit distance between the first `i` elements of the first array and the first `j` elements of the second array. The key operations considered are: 1. **Insertion** (cost 1) 2. **Deletion** (cost 1) 3. **Substitution** (cost 2 if elements differ, 0 if they are the same) The DP table is initialized such that the first row and column represent the cumulative cost of inserting or deleting elements. The rest of the table is filled by considering the minimum cost of each possible operation at each step. ### Solution Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | import numba import numpy as np @numba.njit(nogil=True, cache=True) def levenshtein(a, b): len_a = len(a) len_b = len(b) # Initialize a DP table with dimensions (len_a + 1) x (len_b + 1) dp = np.zeros((len_a + 1, len_b + 1), dtype=np.int64) # Base cases: transforming an empty string to a[0..i] for i in range(len_a + 1): dp[i, 0] = i # Base cases: transforming an empty string to b[0..j] for j in range(len_b + 1): dp[0, j] = j # Fill the DP table for i in range(1, len_a + 1): for j in range(1, len_b + 1): if a[i-1] == b[j-1]: substitution_cost = 0 else: substitution_cost = 2 # Compute the minimum of deletion, insertion, and substitution dp[i, j] = min( dp[i-1, j] + 1, # Deletion dp[i, j-1] + 1, # Insertion dp[i-1, j-1] + substitution_cost # Substitution ) return dp[len_a, len_b] |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | import numba import numpy as np @numba.njit(nogil=True, cache=True) def levenshtein(a, b): len_a = len(a) len_b = len(b) # Initialize a DP table with dimensions (len_a + 1) x (len_b + 1) dp = np.zeros((len_a + 1, len_b + 1), dtype=np.int64) # Base cases: transforming an empty string to a[0..i] for i in range(len_a + 1): dp[i, 0] = i # Base cases: transforming an empty string to b[0..j] for j in range(len_b + 1): dp[0, j] = j # Fill the DP table for i in range(1, len_a + 1): for j in range(1, len_b + 1): if a[i-1] == b[j-1]: substitution_cost = 0 else: substitution_cost = 2 # Compute the minimum of deletion, insertion, and substitution dp[i, j] = min( dp[i-1, j] + 1, # Deletion dp[i, j-1] + 1, # Insertion dp[i-1, j-1] + substitution_cost # Substitution ) return dp[len_a, len_b] |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | import numba import numpy as np @numba.njit(nogil=True, cache=True) def levenshtein(a, b): len_a = len(a) len_b = len(b) # Initialize a DP table with dimensions (len_a + 1) x (len_b + 1) dp = np.zeros((len_a + 1, len_b + 1), dtype=np.int64) # Base cases: transforming an empty string to a[0..i] for i in range(len_a + 1): dp[i, 0] = i # Base cases: transforming an empty string to b[0..j] for j in range(len_b + 1): dp[0, j] = j # Fill the DP table for i in range(1, len_a + 1): for j in range(1, len_b + 1): if a[i-1] == b[j-1]: substitution_cost = 0 else: substitution_cost = 2 # Compute the minimum of deletion, insertion, and substitution dp[i, j] = min( dp[i-1, j] + 1, # Deletion dp[i, j-1] + 1, # Insertion dp[i-1, j-1] + substitution_cost # Substitution ) return dp[len_a, len_b] answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
All tests passed
Testing if the previous output contains the string All tests passed
: True
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: False
Testing if the previous output contains the string import numba
: True
Testing if the previous output contains the string from numba
: False