Class NeedlemanWunschEditDistance

java.lang.Object
com.google.errorprone.names.NeedlemanWunschEditDistance

public final class NeedlemanWunschEditDistance
extends Object
The Needleman-Wunsch algorithm for finding least-cost string edit distances between pairs of strings. Like Levenshtein, but this version allows for a sequence of adjacent deletions/insertions to cost less than the total cost of each individual deletion/insertion, so that, for example editing Christopher into Chris (dropping 6 characters) is not 6 times as expensive as editing Christopher into Christophe.

See http://en.wikipedia.org/wiki/Needleman-Wunsch_algorithm

Author:
alanw@google.com (Alan Wendt)
  • Method Summary

    Modifier and Type Method Description
    static int getEditDistance​(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost)
    Returns the edit distance between two strings.
    static double getNormalizedEditDistance​(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost)
    Returns a normalized edit distance between 0 and 1.
    static int getWorstCaseEditDistance​(int sourceLength, int targetLength, int changeCost, int openGapCost, int continueGapCost)
    Return the worst case edit distance between strings of this length

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Method Details

    • getEditDistance

      public static int getEditDistance​(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost)
      Returns the edit distance between two strings. Levenshtein charges the same cost for each insertion or deletion. This algorithm is slightly more general in that it charges a sequence of adjacent insertions/deletions an up-front cost plus an incremental cost per insert/delete operation. The idea is that Christopher -> Chris should be less than 6 times as expensive as Christopher -> Christophe. The algorithm used to calculate this distance takes time and space proportional to the product of source.length() and target.length() to build the 3 arrays.
      Parameters:
      source - source string.
      target - target string
      caseSensitive - if true, case is used in comparisons and 'a' != 'A'.
      changeCost - cost of changing one character
      openGapCost - cost to open a gap to insert or delete some characters.
      continueGapCost - marginal cost to insert or delete next character.
      Returns:
      edit distance between the source and target strings.
    • getWorstCaseEditDistance

      public static int getWorstCaseEditDistance​(int sourceLength, int targetLength, int changeCost, int openGapCost, int continueGapCost)
      Return the worst case edit distance between strings of this length
    • getNormalizedEditDistance

      public static double getNormalizedEditDistance​(String source, String target, boolean caseSensitive, int changeCost, int openGapCost, int continueGapCost)
      Returns a normalized edit distance between 0 and 1. This is useful if you are comparing or aggregating distances of different pairs of strings