Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | import numpy as np from numba import njit @njit(nopython=True) def levenshtein(a: np.ndarray, b: np.ndarray) -> int: """ Calculate the Levenshtein distance between two numpy int arrays. Parameters: a (np.ndarray): The first array. b (np.ndarray): The second array. Returns: int: The Levenshtein distance between the two arrays. """ # Get the lengths of the arrays m, n = len(a), len(b) # Create a 2D array to store the distances dp = np.zeros((m + 1, n + 1), dtype=np.int32) # Initialize the first row and column for i in range(m + 1): dp[i, 0] = i for j in range(n + 1): dp[0, j] = j # Calculate the Levenshtein distance for i in range(1, m + 1): for j in range(1, n + 1): # If the current elements are equal, there is no cost if a[i - 1] == b[j - 1]: cost = 0 # If the current elements are not equal, the cost is 1 (insertion or deletion) or 2 (substitution) else: cost = 2 # Character swaps are cost 2 # Calculate the minimum distance dp[i, j] = min( dp[i - 1, j] + 1, # Deletion dp[i, j - 1] + 1, # Insertion dp[i - 1, j - 1] + cost # Substitution or no operation ) # Return the Levenshtein distance return dp[m, n] # Example usage a = np.array([1, 2, 3]) b = np.array([1, 3, 2]) distance = levenshtein(a, b) print(f"The Levenshtein distance between the two arrays is: {distance}") |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | import numpy as np from numba import njit @njit(nopython=True) def levenshtein(a: np.ndarray, b: np.ndarray) -> int: """ Calculate the Levenshtein distance between two numpy int arrays. Parameters: a (np.ndarray): The first array. b (np.ndarray): The second array. Returns: int: The Levenshtein distance between the two arrays. """ # Get the lengths of the arrays m, n = len(a), len(b) # Create a 2D array to store the distances dp = np.zeros((m + 1, n + 1), dtype=np.int32) # Initialize the first row and column for i in range(m + 1): dp[i, 0] = i for j in range(n + 1): dp[0, j] = j # Calculate the Levenshtein distance for i in range(1, m + 1): for j in range(1, n + 1): # If the current elements are equal, there is no cost if a[i - 1] == b[j - 1]: cost = 0 # If the current elements are not equal, the cost is 1 (insertion or deletion) or 2 (substitution) else: cost = 2 # Character swaps are cost 2 # Calculate the minimum distance dp[i, j] = min( dp[i - 1, j] + 1, # Deletion dp[i, j - 1] + 1, # Insertion dp[i - 1, j - 1] + cost # Substitution or no operation ) # Return the Levenshtein distance return dp[m, n] # Example usage a = np.array([1, 2, 3]) b = np.array([1, 3, 2]) distance = levenshtein(a, b) print(f"The Levenshtein distance between the two arrays is: {distance}") |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | import numpy as np from numba import njit @njit(nopython=True) def levenshtein(a: np.ndarray, b: np.ndarray) -> int: """ Calculate the Levenshtein distance between two numpy int arrays. Parameters: a (np.ndarray): The first array. b (np.ndarray): The second array. Returns: int: The Levenshtein distance between the two arrays. """ # Get the lengths of the arrays m, n = len(a), len(b) # Create a 2D array to store the distances dp = np.zeros((m + 1, n + 1), dtype=np.int32) # Initialize the first row and column for i in range(m + 1): dp[i, 0] = i for j in range(n + 1): dp[0, j] = j # Calculate the Levenshtein distance for i in range(1, m + 1): for j in range(1, n + 1): # If the current elements are equal, there is no cost if a[i - 1] == b[j - 1]: cost = 0 # If the current elements are not equal, the cost is 1 (insertion or deletion) or 2 (substitution) else: cost = 2 # Character swaps are cost 2 # Calculate the minimum distance dp[i, j] = min( dp[i - 1, j] + 1, # Deletion dp[i, j - 1] + 1, # Insertion dp[i - 1, j - 1] + cost # Substitution or no operation ) # Return the Levenshtein distance return dp[m, n] # Example usage a = np.array([1, 2, 3]) b = np.array([1, 3, 2]) distance = levenshtein(a, b) print(f"The Levenshtein distance between the two arrays is: {distance}") answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
The Levenshtein distance between the two arrays is: 2
All tests passed
/usr/local/lib/python3.12/dist-packages/numba/core/decorators.py:246: RuntimeWarning: nopython is set for njit and is ignored
warnings.warn('nopython is set for njit and is ignored', RuntimeWarning)
Testing if the previous output contains the string All tests passed
: True
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: True
Testing if the previous output contains the string import numba
: False
Testing if the previous output contains the string from numba
: True