Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Certainly! Below is a Numba implementation of the Levenshtein distance algorithm that takes two NumPy integer arrays as input. The cost for character swaps is set to 2, and the function is compiled in `nopython` mode.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | import numpy as np import numba from numba import njit @njit def levenshtein(arr1, arr2): len1 = len(arr1) len2 = len(arr2) # Create a distance matrix dist_matrix = np.zeros((len1 + 1, len2 + 1), dtype=np.int32) # Initialize the first row and column for i in range(len1 + 1): dist_matrix[i, 0] = i for j in range(len2 + 1): dist_matrix[0, j] = j # Fill the distance matrix for i in range(1, len1 + 1): for j in range(1, len2 + 1): if arr1[i - 1] == arr2[j - 1]: cost = 0 else: cost = 1 dist_matrix[i, j] = min( dist_matrix[i - 1, j] + 1, # Deletion dist_matrix[i, j - 1] + 1, # Insertion dist_matrix[i - 1, j - 1] + cost # Substitution ) # Check for swaps (transpositions) if i > 1 and j > 1 and arr1[i - 1] == arr2[j - 2] and arr1[i - 2] == arr2[j - 1]: dist_matrix[i, j] = min( dist_matrix[i, j], dist_matrix[i - 2, j - 2] + 2 # Swap cost is 2 ) return dist_matrix[len1, len2] # Example usage arr1 = np.array([1, 2, 3, 4], dtype=np.int32) arr2 = np.array([1, 3, 2, 4], dtype=np.int32) distance = levenshtein(arr1, arr2) print("Levenshtein distance:", distance) |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | import numpy as np import numba from numba import njit @njit def levenshtein(arr1, arr2): len1 = len(arr1) len2 = len(arr2) # Create a distance matrix dist_matrix = np.zeros((len1 + 1, len2 + 1), dtype=np.int32) # Initialize the first row and column for i in range(len1 + 1): dist_matrix[i, 0] = i for j in range(len2 + 1): dist_matrix[0, j] = j # Fill the distance matrix for i in range(1, len1 + 1): for j in range(1, len2 + 1): if arr1[i - 1] == arr2[j - 1]: cost = 0 else: cost = 1 dist_matrix[i, j] = min( dist_matrix[i - 1, j] + 1, # Deletion dist_matrix[i, j - 1] + 1, # Insertion dist_matrix[i - 1, j - 1] + cost # Substitution ) # Check for swaps (transpositions) if i > 1 and j > 1 and arr1[i - 1] == arr2[j - 2] and arr1[i - 2] == arr2[j - 1]: dist_matrix[i, j] = min( dist_matrix[i, j], dist_matrix[i - 2, j - 2] + 2 # Swap cost is 2 ) return dist_matrix[len1, len2] # Example usage arr1 = np.array([1, 2, 3, 4], dtype=np.int32) arr2 = np.array([1, 3, 2, 4], dtype=np.int32) distance = levenshtein(arr1, arr2) print("Levenshtein distance:", distance) |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | import numpy as np import numba from numba import njit @njit def levenshtein(arr1, arr2): len1 = len(arr1) len2 = len(arr2) # Create a distance matrix dist_matrix = np.zeros((len1 + 1, len2 + 1), dtype=np.int32) # Initialize the first row and column for i in range(len1 + 1): dist_matrix[i, 0] = i for j in range(len2 + 1): dist_matrix[0, j] = j # Fill the distance matrix for i in range(1, len1 + 1): for j in range(1, len2 + 1): if arr1[i - 1] == arr2[j - 1]: cost = 0 else: cost = 1 dist_matrix[i, j] = min( dist_matrix[i - 1, j] + 1, # Deletion dist_matrix[i, j - 1] + 1, # Insertion dist_matrix[i - 1, j - 1] + cost # Substitution ) # Check for swaps (transpositions) if i > 1 and j > 1 and arr1[i - 1] == arr2[j - 2] and arr1[i - 2] == arr2[j - 1]: dist_matrix[i, j] = min( dist_matrix[i, j], dist_matrix[i - 2, j - 2] + 2 # Swap cost is 2 ) return dist_matrix[len1, len2] # Example usage arr1 = np.array([1, 2, 3, 4], dtype=np.int32) arr2 = np.array([1, 3, 2, 4], dtype=np.int32) distance = levenshtein(arr1, arr2) print("Levenshtein distance:", distance) answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
Levenshtein distance: 2
Traceback (most recent call last):
File "/usr/src/app/main.py", line 56, in <module>
assert answer == expected, f'Wrong answer; got {answer} instead of {expected}'
^^^^^^^^^^^^^^^^^^
AssertionError: Wrong answer; got 2 instead of 3
Testing if the previous output contains the string All tests passed
: False
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: False
Testing if the previous output contains the string import numba
: True
Testing if the previous output contains the string from numba
: True