Issue 26150: SequenceMatcher's algorithm is not correct (original) (raw)

For strings 'aaaaaa', 'aabaaa' SequenceMatcher's algorithm finds only common substring 'aaa', while well-known classic LCS algorithm: http://www.geeksforgeeks.org/printing-longest-common-subsequence/ finds 'aa' and 'aaa'.

Is it the price for "best case time is linear", as mentioned in difflib's documentation? Are there any other reasons not to implement classic LCS algorith (e.g. memory limits?)? If no, maybe it will be usefull to create subclass StrictSequenceMatcher?

Please read the responses to this older report:

http://bugs.python.org/issue25391

As they say, it's functioning as designed and documented, so this isn't "a bug". For that reason I'm closing this as "not a bug".

As they also say, there are many other possible algorithms (LCS isn't the only other one in use). Opening an enhancement request instead (to implement additional algorithms) could make sense, but won't get anywhere unless someone volunteers to do the work.