[Python-Dev] 3.3 str timings (original) (raw)
Victor Stinner victor.stinner at gmail.com
Tue Aug 21 15:04:03 CEST 2012
- Previous message: [Python-Dev] 3.3 str timings
- Next message: [Python-Dev] 3.3 str timings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2012/8/18 Terry Reedy <tjreedy at udel.edu>:
The issue came up in python-list about string operations being slower in 3.3. (The categorical claim is false as some things are actually faster.)
Yes, some operations are slower, but others are faster :-) There was an important effort to limit the overhead of the PEP 393 (when the branch was merged, most operations were slower). I tried to fix all performance regressions. If you find cases where Python 3.3 is slower, I can investigate and try to optimize it (in Python 3.4) or at least explain why it is slower :-)
As said by Antoine, use the stringbench tool if you would like to get a first overview of string performances.
Some things I understand, this one I do not.
Win7-64, 3.3.0b2 versus 3.2.3 print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230 # .6 in 3.2, 1.2 in 3.3
On Linux with narrow build (UTF-16), I get:
$ python3.2 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a" 100000 loops, best of 3: 4.25 usec per loop $ python3.3 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a" 100000 loops, best of 3: 3.21 usec per loop
Linux-2.6.30.10-105.2.23.fc11.i586-i686-with-fedora-11-Leonidas Python 3.2.2+ (3.2:1453d2fe05bf, Aug 21 2012, 14:21:05) Python 3.3.0b2+ (default:b36ce0a3a844, Aug 21 2012, 14:05:23)
I'm not sure that I read your benchmark correctly: you write c='...' and then ord(c)=8230. Algorithms to find a substring are different if the substring is a single character or if the substring is longer. For 1 character, Antoine Pitrou modified the code to use memchr() and memrchr(), even if the string is not UCS1 (if this benchmark, the string uses a UCS2 storage): it may find false positives.
Why is searching for a two-byte char in a two-bytes per char string so much faster in 3.2?
Can you reproduce your benchmark on other Windows platforms? Do you run the benchmark more than once? I always run a benchmark 3 times.
I don't like the timeit module for micro benchmarks, it is really unstable (default settings are not written for micro benchmarks). Example of 4 runs on the same platform:
$ ./python -m timeit -s "a='a'*1000" "a.encode()" 100000 loops, best of 3: 2.79 usec per loop $ ./python -m timeit -s "a='a'*1000" "a.encode()" 100000 loops, best of 3: 2.61 usec per loop $ ./python -m timeit -s "a='a'*1000" "a.encode()" 100000 loops, best of 3: 3.16 usec per loop $ ./python -m timeit -s "a='a'*1000" "a.encode()" 100000 loops, best of 3: 2.76 usec per loop
I wrote my own benchmark tool, based on timeit, to have more stable results on micro benchmarks: https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
Example of 4 runs:
3.18 us: c=chr(8230); a='a'*1000+c; c in a 3.18 us: c=chr(8230); a='a'*1000+c; c in a 3.21 us: c=chr(8230); a='a'*1000+c; c in a 3.18 us: c=chr(8230); a='a'*1000+c; c in a
My benchmark.py script calibrates automatically the number of loops to take at least 100 ms, and then repeat the test during at least 1.0 second.
Using time instead of a fixed number of loops is more reliable because the test is less dependent on the system activity.
print(timeit("a.encode()", "a = 'a'*1000")) # 1.5 in 3.2, .26 in 3.3
print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000")) # 1.7 in 3.2, .51 in 3.3
This test doesn't compare performances of the UTF-8 encoder: "encode" an ASCII string to UTF-8 in Python 3.3 is a no-op, it just duplicates the memory (ASCII is compatible with UTF-8)...
So your benchmark just measures the performances of PyArg_ParseTupleAndKeywords()... Try also str.encode('utf-8').
If you want to benchmark the UTF-8 encoder, use at least a non-ASCII character like "\x80".
At least, your benchmark shows that Python 3.3 is much faster than Python 3.2 to "encode" pure ASCII strings to UTF-8 :-)
Victor
- Previous message: [Python-Dev] 3.3 str timings
- Next message: [Python-Dev] 3.3 str timings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]