[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken (original) (raw)
Kevin Jacobs jacobs@bioinformed.com bioinformed at gmail.com
Wed Jul 7 03:47:41 CEST 2010
- Previous message: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
- Next message: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Jul 6, 2010 at 7:18 PM, Terry Reedy <tjreedy at udel.edu> wrote:
[Also posted to http://bugs.python.org/issue2986 A much faster way to find the first mismatch would be i = 0 while first[i] == second[i]: i+=1 The match ratio, based on the initial matching prefix only, is spuriously low.
I don't have much experience with the Python sequence matcher, but many classical edit distance and alignment algorithms benefit from stripping any common prefix and suffix before engaging in heavy-lifting. This is trivially optimal for Hamming-like distances and easily shown to be for Levenshtein and Damerau type distances.
-Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20100706/d91dcdcc/attachment.html>
- Previous message: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
- Next message: [Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]