[Python-Dev] PEP 393 decode() oddity (original) (raw)

Victor Stinner victor.stinner at gmail.com
Mon Mar 26 00:28:33 CEST 2012

Previous message: [Python-Dev] PEP 393 decode() oddity
Next message: [Python-Dev] PEP 393 decode() oddity
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Cool, Python 3.3 is much faster to decode pure ASCII :-)

encoding string 2.7 3.2 3.3

ascii " " * 1000 5.4 5.3 1.2

4.5 faster than Python 2 here.

utf-8 " " * 1000 6.7 2.4 2.1

3.2x faster

It's cool because in practice, a lot of strings are pure ASCII (as Martin showed in its Django benchmark).

latin1 " " * 1000 1.8 1.7 1.3 latin1 "\u0080" * 1000 1.7 1.6 1.0 ... The first oddity in that the characters from the second half of the Latin1 table decoded faster than the characters from the first half.

The Latin1 decoder of Python 3.3 is faster than the decoder of Python 2.7 and 3.2 according to your bench. So I don't see any issue here :-) Martin explained why it is slower for pure ASCII.

I think that the characters from the first half of the table must be decoded as quickly.

The Latin1 decoder is already heavily optimized, I don't see how to make it faster.

The second sad oddity in that UTF-16 decoding in 3.3 is much slower than even in 2.7. Compared with 3.2 decoding is slower in 2-3 times. This is a considerable regress. UTF-32 decoding is also slowed down by 1.5-2 times.

Only ASCII, latin1 and UTF-8 decoder are heavily optimized. We can do better for UTF-16 and UTF-32.

I'm just less motivated because UTF-16/32 are less common than ASCII/latin1/UTF-8.

How serious a problem this is for the Python 3.3 release? I could do the optimization, if someone is not working on this already.

I'm interested by any patch optimizing any Python codecs. I'm not working on optimizing Python Unicode anymore, various benchmarks showed me that Python 3.3 is as good or faster than Python 3.2. That's enough for me.

When Python 3.3 is slower than Python 3.2, it's because Python 3.3 must compute the maximum character of the result, and I fail to see how to optimize this requirement. I already introduced many fast-path where it was possible, like creating a substring of an ASCII string (the result is ASCII, no need to scan the substring).

It doesn't mean that it is no more possible to optimize Python Unicode ;-)

Victor

Previous message: [Python-Dev] PEP 393 decode() oddity
Next message: [Python-Dev] PEP 393 decode() oddity
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list