[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken (original) (raw)

Tim Peters tim.peters at gmail.com
Thu Jul 8 15:39:08 CEST 2010


[Antoine Pitrou]

I don't think 2.7 should get any change at all here. Only 3.2 should be modified. As Tim said, difflib works ok for its intended use (regular text diffs).

That was the use case that drove the implementation, but it's going too far to say that was the only "intended" case. I believe (but can't prove) that remains the most common use (& overwhelmingly so), but it was indeed intended to work for any sequences of hashable elements.

And it always did, and it still does, in the sense that it computes a diff that transforms the first sequence into the second sequence. The problem is that I introduced a heuristic speedup with the primary use case in mind that turned out to vastly damage the quality of the results for some other uses (a correct diff isn't necessarily a useful diff - for example, "delete the entire sequence you started with, then insert the entire new sequence" is a correct diff for any pair of input sequences, but not a useful diff for most purposes).

Making it work for other uses is a new feature, not a bugfix.

Definitely not a new feature. These other cases used to deliver much better diffs, before I introduced the heuristic in question. People with these other cases are asking for a way to get the results they used to get - and we know that's so because a few figured out they get what they want just by (in effect) reverting the checkin (made about 8 years ago) that introduced the heuristic. So they're looking for a way to restore older behavior, not to introduce new behavior. Of course this is obscured by that the change happened so long ago that I bet most of them don't know at first that it was the old behavior.



More information about the Python-Dev mailing list