Class NeedlemanWunschEditDistance
java.lang.Object
com.google.errorprone.names.NeedlemanWunschEditDistance
The Needleman-Wunsch algorithm for finding least-cost string edit distances between pairs of
strings. Like Levenshtein, but this version allows for a sequence of adjacent
deletions/insertions to cost less than the total cost of each individual deletion/insertion, so
that, for example editing
Christopher
into Chris
(dropping 6 characters) is not 6
times as expensive as editing Christopher
into Christophe
.
See http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm
- Author:
- alanw@google.com (Alan Wendt)
-
Method Summary
Modifier and TypeMethodDescriptionstatic int
getEditDistance
(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost) Returns the edit distance between two strings.static double
getNormalizedEditDistance
(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost) Returns a normalized edit distance between 0 and 1.static int
getWorstCaseEditDistance
(int sourceLength, int targetLength, int changeCost, int openGapCost, int continueGapCost) Return the worst case edit distance between strings of this length
-
Method Details
-
getEditDistance
public static int getEditDistance(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost) Returns the edit distance between two strings. Levenshtein charges the same cost for each insertion or deletion. This algorithm is slightly more general in that it charges a sequence of adjacent insertions/deletions an up-front cost plus an incremental cost per insert/delete operation. The idea is that Christopher -> Chris should be less than 6 times as expensive as Christopher -> Christophe. The algorithm used to calculate this distance takes time and space proportional to the product ofsource.length()
andtarget.length()
to build the 3 arrays.- Parameters:
source
- source string.target
- target stringcaseSensitive
- if true, case is used in comparisons and 'a' != 'A'.changeCost
- cost of changing one characteropenGapCost
- cost to open a gap to insert or delete some characters.continueGapCost
- marginal cost to insert or delete next character.- Returns:
- edit distance between the source and target strings.
-
getWorstCaseEditDistance
public static int getWorstCaseEditDistance(int sourceLength, int targetLength, int changeCost, int openGapCost, int continueGapCost) Return the worst case edit distance between strings of this length -
getNormalizedEditDistance
-