Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
# Numba Implementation of Levenshtein Distance Here's a Numba implementation of the Levenshtein distance algorithm that takes two numpy integer arrays as input and assigns a cost of 2 for character substitutions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import numpy as np from numba import njit @njit def levenshtein(arr1, arr2): """ Calculate the Levenshtein distance between two numpy int arrays. Parameters: ----------- arr1, arr2 : numpy arrays of integers Returns: -------- int : the Levenshtein distance Notes: ------ - Substitutions (character swaps) have a cost of 2 - Insertions and deletions have a cost of 1 """ m, n = len(arr1), len(arr2) # Create a matrix of size (m+1) x (n+1) dp = np.zeros((m+1, n+1), dtype=np.int32) # Initialize the first row and column for i in range(m+1): dp[i, 0] = i for j in range(n+1): dp[0, j] = j # Fill the matrix for i in range(1, m+1): for j in range(1, n+1): # If characters match, no cost for substitution if arr1[i-1] == arr2[j-1]: cost = 0 else: cost = 2 # Cost of 2 for character swaps dp[i, j] = min( dp[i-1, j] + 1, # Deletion dp[i, j-1] + 1, # Insertion dp[i-1, j-1] + cost # Substitution ) return dp[m, n] |
1 2 3 4 5 6 7 8 9 | # Example usage if __name__ == "__main__": # Example arrays a = np.array([1, 2, 3, 4, 5], dtype=np.int32) b = np.array([1, 3, 4, 5, 6], dtype=np.int32) # Calculate Levenshtein distance distance = levenshtein(a, b) print(f"Levenshtein distance: {distance}") |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | import numpy as np from numba import njit @njit def levenshtein(arr1, arr2): """ Calculate the Levenshtein distance between two numpy int arrays. Parameters: ----------- arr1, arr2 : numpy arrays of integers Returns: -------- int : the Levenshtein distance Notes: ------ - Substitutions (character swaps) have a cost of 2 - Insertions and deletions have a cost of 1 """ m, n = len(arr1), len(arr2) # Create a matrix of size (m+1) x (n+1) dp = np.zeros((m+1, n+1), dtype=np.int32) # Initialize the first row and column for i in range(m+1): dp[i, 0] = i for j in range(n+1): dp[0, j] = j # Fill the matrix for i in range(1, m+1): for j in range(1, n+1): # If characters match, no cost for substitution if arr1[i-1] == arr2[j-1]: cost = 0 else: cost = 2 # Cost of 2 for character swaps dp[i, j] = min( dp[i-1, j] + 1, # Deletion dp[i, j-1] + 1, # Insertion dp[i-1, j-1] + cost # Substitution ) return dp[m, n] |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | import numpy as np from numba import njit @njit def levenshtein(arr1, arr2): """ Calculate the Levenshtein distance between two numpy int arrays. Parameters: ----------- arr1, arr2 : numpy arrays of integers Returns: -------- int : the Levenshtein distance Notes: ------ - Substitutions (character swaps) have a cost of 2 - Insertions and deletions have a cost of 1 """ m, n = len(arr1), len(arr2) # Create a matrix of size (m+1) x (n+1) dp = np.zeros((m+1, n+1), dtype=np.int32) # Initialize the first row and column for i in range(m+1): dp[i, 0] = i for j in range(n+1): dp[0, j] = j # Fill the matrix for i in range(1, m+1): for j in range(1, n+1): # If characters match, no cost for substitution if arr1[i-1] == arr2[j-1]: cost = 0 else: cost = 2 # Cost of 2 for character swaps dp[i, j] = min( dp[i-1, j] + 1, # Deletion dp[i, j-1] + 1, # Insertion dp[i-1, j-1] + cost # Substitution ) return dp[m, n] answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
All tests passed
Testing if the previous output contains the string All tests passed
: True
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: False
Testing if the previous output contains the string import numba
: False
Testing if the previous output contains the string from numba
: True
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
Write me a numba implementation of levensthin distance, taking two numpy int arrays. Call it levenshtein. Run it in nopython mode. Character swaps are cost 2.
# Numba Implementation of Levenshtein Distance Here's a Numba implementation of the Levenshtein distance algorithm that works with two numpy integer arrays. It runs in nopython mode and assigns a cost of 2 to character substitutions (swaps):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | import numba import numpy as np @numba.njit(nopython=True) def levenshtein(s1, s2): """ Calculate the Levenshtein distance between two numpy integer arrays. Parameters: s1 (numpy.ndarray): First array of integers s2 (numpy.ndarray): Second array of integers Returns: int: Levenshtein distance with substitutions costing 2 """ # Get lengths of both arrays len_s1 = len(s1) len_s2 = len(s2) # Create a distance matrix dp = np.zeros((len_s1 + 1, len_s2 + 1), dtype=np.int32) # Initialize first row and column for i in range(len_s1 + 1): dp[i, 0] = i for j in range(len_s2 + 1): dp[0, j] = j # Fill in the distance matrix for i in range(1, len_s1 + 1): for j in range(1, len_s2 + 1): if s1[i-1] == s2[j-1]: # No change needed if elements are the same dp[i, j] = dp[i-1, j-1] else: # Calculate costs for operations: deletion = dp[i-1, j] + 1 # Delete from s1 insertion = dp[i, j-1] + 1 # Insert into s1 substitution = dp[i-1, j-1] + 2 # Substitute (costs 2) # Choose the minimum cost operation dp[i, j] = min(deletion, insertion, substitution) # Return the final distance return dp[len_s1, len_s2] |
1 2 3 4 5 6 7 | # Example arrays a = np.array([1, 2, 3, 4], dtype=np.int32) b = np.array([1, 2, 5, 4], dtype=np.int32) # Calculate distance distance = levenshtein(a, b) print(f"Levenshtein distance: {distance}") # Output: 2 (one substitution at cost 2) |
I extracted the following code from that output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | import numba import numpy as np @numba.njit(nopython=True) def levenshtein(s1, s2): """ Calculate the Levenshtein distance between two numpy integer arrays. Parameters: s1 (numpy.ndarray): First array of integers s2 (numpy.ndarray): Second array of integers Returns: int: Levenshtein distance with substitutions costing 2 """ # Get lengths of both arrays len_s1 = len(s1) len_s2 = len(s2) # Create a distance matrix dp = np.zeros((len_s1 + 1, len_s2 + 1), dtype=np.int32) # Initialize first row and column for i in range(len_s1 + 1): dp[i, 0] = i for j in range(len_s2 + 1): dp[0, j] = j # Fill in the distance matrix for i in range(1, len_s1 + 1): for j in range(1, len_s2 + 1): if s1[i-1] == s2[j-1]: # No change needed if elements are the same dp[i, j] = dp[i-1, j-1] else: # Calculate costs for operations: deletion = dp[i-1, j] + 1 # Delete from s1 insertion = dp[i, j-1] + 1 # Insert into s1 substitution = dp[i-1, j-1] + 2 # Substitute (costs 2) # Choose the minimum cost operation dp[i, j] = min(deletion, insertion, substitution) # Return the final distance return dp[len_s1, len_s2] |
Running the following program:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | import numba import numpy as np @numba.njit(nopython=True) def levenshtein(s1, s2): """ Calculate the Levenshtein distance between two numpy integer arrays. Parameters: s1 (numpy.ndarray): First array of integers s2 (numpy.ndarray): Second array of integers Returns: int: Levenshtein distance with substitutions costing 2 """ # Get lengths of both arrays len_s1 = len(s1) len_s2 = len(s2) # Create a distance matrix dp = np.zeros((len_s1 + 1, len_s2 + 1), dtype=np.int32) # Initialize first row and column for i in range(len_s1 + 1): dp[i, 0] = i for j in range(len_s2 + 1): dp[0, j] = j # Fill in the distance matrix for i in range(1, len_s1 + 1): for j in range(1, len_s2 + 1): if s1[i-1] == s2[j-1]: # No change needed if elements are the same dp[i, j] = dp[i-1, j-1] else: # Calculate costs for operations: deletion = dp[i-1, j] + 1 # Delete from s1 insertion = dp[i, j-1] + 1 # Insert into s1 substitution = dp[i-1, j-1] + 2 # Substitute (costs 2) # Choose the minimum cost operation dp[i, j] = min(deletion, insertion, substitution) # Return the final distance return dp[len_s1, len_s2] answer = levenshtein(np.array([3, 5, 2, 4, 8, 7]), np.array([3, 4, 2, 4, 8, 7, 9])) expected = 3 assert answer == expected, f'Wrong answer; got {answer} instead of {expected}' print('All tests passed') |
All tests passed
/usr/local/lib/python3.12/dist-packages/numba/core/decorators.py:246: RuntimeWarning: nopython is set for njit and is ignored
warnings.warn('nopython is set for njit and is ignored', RuntimeWarning)
Testing if the previous output contains the string All tests passed
: True
Testing if the previous output contains the string njit
: True
Testing if the previous output contains the string nopython
: True
Testing if the previous output contains the string import numba
: True
Testing if the previous output contains the string from numba
: False