[Python-Dev] Issue 2986: difflib.SequenceMatcher is partly broken (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Wed Jul 7 20:40:08 CEST 2010


On Wed, 7 Jul 2010 19:44:31 +0200 Eli Bendersky <eliben at gmail.com> wrote:

For what it's worth, my benchmarking showed that modifying the heuristic to only kick in when there are more than 100 kinds of elements (Terry's option A) didn't affect the runtime of matching whatsoever, even when the heuristic does kick in. All it adds, really, is the overhead of a single 'if' statement. So it wouldn't be right to assume that somehow modifying the heuristic or allowing to turn it off will negatively affect performance in the special case Tim originally optimized for.

Just because it doesn't affect performance in your tests doesn't mean it won't do so in the general case. Consider a case where Tim's junk optimization kicked in and helped improve performance a lot, but where there are still less than 100 alphabet symbols. The new heuristic will ruin this use case.

That's why I'm advocating a dedicated flag instead.



More information about the Python-Dev mailing list