Fuzzy String Matching


Fuzzy string matching is an approximate string matching. When building chat bots the biggest challenge is to understand the user input. It is a fairly common complaint among the bot-masters that their bots have wonderful responses ready, but the input always comes just 1% off target... There are several ways for a program to compare input to the stored keyphrase value.  The most obvious being  a direct 100% match:  if ( input == key[i] )  i.e. input it "hello" and the database contains "hello" keyphrase.


Or if there is a partial match if( input.match( key[i] )  )  using regex or indexOf() methods, i.e. input is "hello there" and the database contains "hello" key-phrase -- there will be a match.  But what if input is "hallo" or "helloo" or something similar? Of course a bot program can have a pre -process module that can spell-correct all these forms to "hello", but it turns out there is a far more sexy way to match strings.  The demo below implements Levenshtein distance algorithm for approximate string matching.  It is not a neural net, although neural nets can also be used for this purpose.  It acts similar to a neural net, but it is less computationally intensive. 


The fields are pre-loaded with test strings, and the matching results are very impressive, although sometimes there may be a wild mismatch. The algorithm would try to match the input to the closest string in the database. You can delete, add and edit any sentences there.   It is often said by non-professionals that computers can understand only very precise instructions. This is not necessarily the case. Have fun.

Input to test


Database of keyphrases to test for closest match




The above program uses Levenshtein distance algorithm. Artificial Neural Net, ANN, can also be used for this task. ANNs are excellent pattern recognizers.  ANNs are, however computationally intensive, and what happens inside the ANN is a mystery to all, but those with PhD in mathematics.  Levenshtein distance algorithm, on the other hand is perfectly simple and it calculates how many letter moves and substitutions are required to turn one phrase to another.  The lower this number, the closer are the two compared phrases.

There are other similar fuzzy string comparison algorithms and variations of Levenshtein. My particular implementation gives extra points (subtracts them from the distance) for every matched whole word.



© GreenFusion Enterprises