[Python-Dev] Why not using the hash when comparing strings? (original) (raw)

Brett Cannon brett at python.org
Fri Oct 19 15:56:15 CEST 2012

Previous message: [Python-Dev] Why not using the hash when comparing strings?
Next message: [Python-Dev] Why not using the hash when comparing strings?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Oct 19, 2012 at 8:36 AM, Victor Stinner <victor.stinner at gmail.com>wrote:

2012/10/19 Benjamin Peterson <benjamin at python.org>: > It would be interesting to see how common it is for strings which have > their hash computed to be compared.

I implemented a quick hack. When running "./python -m test testos": Python calls PyUnicodeRichCompare() 15206 times with PyEQ or PyNE operator. In 41.4% (6295 calls), the hash of the two operands is known. In 41.2% (6262 times on 15206), the hash of the two operands are known and are different! The hit rate may depend since when the process was started. For example, in a fresh interpreter: the hit rate is only 7% (189 hit / 2703 calls). When running the test suite, the hit rate is around 80% (hashs are known in 90%) after running 70 tests. At the same time, the average of string length is 4.1 characters and quite all strings are pure ASCII. I create the issue http://bugs.python.org/issue16286 to discuss this optimization.

If you want to measure the performance impact compared to a clean build then you can use the unladen benchmarks as it contains several Python 3-compatible benchmarks now. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20121019/345830f3/attachment.html>

Previous message: [Python-Dev] Why not using the hash when comparing strings?
Next message: [Python-Dev] Why not using the hash when comparing strings?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list