[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken (original) (raw)

Kevin Jacobs jacobs@bioinformed.com bioinformed at gmail.com
Wed Jul 7 03:47:41 CEST 2010


On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy <tjreedy at udel.edu> wrote:

[Also posted to http://bugs.python.org/issue2986 A much faster way to find the first mismatch would be i = 0 while first[i] == second[i]: i+=1 The match ratio, based on the initial matching prefix only, is spuriously low.

I don't have much experience with the Python sequence matcher, but many classical edit distance and alignment algorithms benefit from stripping any common prefix and suffix before engaging in heavy-lifting. This is trivially optimal for Hamming-like distances and easily shown to be for Levenshtein and Damerau type distances.

-Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100706/d91dcdcc/attachment.html>



More information about the Python-Dev mailing list