public final class NeedlemanWunschEditDistance
The Needleman-Wunsch algorithm for finding least-cost string edit distances between pairs of
strings. Like Levenshtein, but this version allows for a sequence of adjacent
deletions/insertions to cost less than the total cost of each individual deletion/insertion, so
that, for example editing Christopher into Chris (dropping 6 characters) is not 6
times as expensive as editing Christopher into Christophe.
public static int getEditDistance(String source,
Returns the edit distance between two strings. Levenshtein charges the same cost for each
insertion or deletion. This algorithm is slightly more general in that it charges a sequence of
adjacent insertions/deletions an up-front cost plus an incremental cost per insert/delete
operation. The idea is that Christopher -> Chris should be less than 6 times as expensive as
Christopher -> Christophe. The algorithm used to calculate this distance takes time and
space proportional to the product of source.length() and target.length() to
build the 3 arrays.
source - source string.
target - target string
caseSensitive - if true, case is used in comparisons and 'a' != 'A'.
changeCost - cost of changing one character
openGapCost - cost to open a gap to insert or delete some characters.
continueGapCost - marginal cost to insert or delete next character.
edit distance between the source and target strings.
public static int getWorstCaseEditDistance(int sourceLength,
Return the worst case edit distance between strings of this length
public static double getNormalizedEditDistance(String source,
Returns a normalized edit distance between 0 and 1. This is useful if you are comparing or
aggregating distances of different pairs of strings