Issue 4622: SequenceMatcher bug with long sequences (original) (raw)
Issue4622
Created on 2008-12-10 17:20 by eli.bendersky, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (5) | ||
---|---|---|
msg77559 - (view) | Author: Eli Bendersky (eli.bendersky) * ![]() |
Date: 2008-12-10 17:20 |
Here's a reproduction of the error: Python 2.5.2 (r252:60911, Oct 20 2008, 09:11:31) [GCC 3.4.6 20060404 (Red Hat 3.4.6-10)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import difflib >>> >>> difflib.SequenceMatcher(None, [4] + [5] * 200, [5] * 200).ratio() 0.0 ratio() should be returning close to 1.0 here, not 0. This is only a problem for sequences longer than 200. The analogous run for 100: >>> difflib.SequenceMatcher(None, [4] + [5] * 100, [5] * 100).ratio() 0.99502487562189057 >>> I've managed to reproduce it on Linux, Windows (AS 2.5.2) and Try Python (http://try-python.mired.org/) | ||
msg77561 - (view) | Author: David W. Lambert (LambertDW) | Date: 2008-12-10 18:02 |
Python 3.0rc1+ similar. | ||
msg77565 - (view) | Author: Gabriel Genellina (ggenellina) | Date: 2008-12-10 19:38 |
Python 2.3.4 and later have this bug. But release 2.1.3 doesn't: Python 2.1.3 (#35, Apr 8 2002, 17:47:50) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import difflib >>> difflib.SequenceMatcher(None, [4] + [5] * 500, [5] * 500).ratio() 0.99900099900099903 >>> difflib.SequenceMatcher(None, [4] + [5] * 200, [5] * 200).ratio() 0.99750623441396513 >>> difflib.SequenceMatcher(None, [4] + [5] * 100, [5] * 100).ratio() 0.99502487562189057 I don't have any 2.2 release to test right now. | ||
msg77566 - (view) | Author: Gabriel Genellina (ggenellina) | Date: 2008-12-10 19:42 |
#2986 may be a duplicate of this; #1528074 is relevant too. | ||
msg108635 - (view) | Author: Terry J. Reedy (terry.reedy) * ![]() |
Date: 2010-06-25 21:55 |
This appears to be one of at least three duplicate issues: #1528074, #2986, and #4622. I am closing two, leaving 2986 open, and merging the nearly disjoint nosy lists. (If no longer interested, you can delete yourself from 2986.) #1711800 appears to be slightly different (if not, it could be closed also.) Whether or not a new feature is ever added (earliest, now, 3.2), it appears that the docs need improvement to at least explain the current behavior. If someone who understands the issue could open a separate doc issue (for 2.6/7/3.1/2) with a suggested addition, that would be great. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:42 | admin | set | github: 48872 |
2010-06-25 21:55:55 | terry.reedy | set | status: open -> closednosy: + terry.reedymessages: + superseder: difflib.SequenceMatcher not matching long sequencesresolution: duplicate |
2008-12-10 19:42:21 | ggenellina | set | messages: + |
2008-12-10 19:38:27 | ggenellina | set | nosy: + ggenellinamessages: + |
2008-12-10 18:02:57 | LambertDW | set | nosy: + LambertDWmessages: + versions: + Python 3.0 |
2008-12-10 17:20:54 | eli.bendersky | create |