[Python-Dev] PEP 393 decode() oddity (original) (raw)

Serhiy Storchaka storchaka at gmail.com
Tue Mar 27 00:04:05 CEST 2012

Previous message: [Python-Dev] PEP 393 decode() oddity
Next message: [Python-Dev] PEP 393 decode() oddity
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

26.03.12 01:28, Victor Stinner написав(ла):

Cool, Python 3.3 is much faster to decode pure ASCII :-)

He even faster on large data. 1000 characters is not enough to completely neutralize the constant costs of the function calls. Python 3.3 is really cool.

encoding string 2.7 3.2 3.3

ascii " " * 1000 5.4 5.3 1.2 4.5 faster than Python 2 here.

And it can be accelerated (issue #14419).

utf-8 " " * 1000 6.7 2.4 2.1 3.2x faster

In theory, the speed must coincide with latin1 speed. And it coincides in the limit, for large data. For medium data starting overhead cost is visible and utf-8 is a bit slower than it could be.

It's cool because in practice, a lot of strings are pure ASCII (as Martin showed in its Django benchmark).

But there are a lot of non-ascii text. But with mostly-ascii text, containing at least one non-ascii character (for example, Martin's full name), utf-8 decoder copes much worse. And worse than in Python 3.2.

The decoder may be slower only by a small amount, related to scanning. I believe that the stock to optimize exists.

I'm interested by any patch optimizing any Python codecs. I'm not working on optimizing Python Unicode anymore, various benchmarks showed me that Python 3.3 is as good or faster than Python 3.2. That's enough for me.

Then would you accept a patch, proposed by me in issue 14249? This patch will not catch up all arrears, but it is very simple and should not cause objections. Developed by me now optimization accelerates decoder even more, but so far it is too ugly spaghetti-code.

When Python 3.3 is slower than Python 3.2, it's because Python 3.3 must compute the maximum character of the result, and I fail to see how to optimize this requirement.

A significant slowdown was caused by the use of PyUnicode_WRITE with a variable kind in loop. In some cases, it would be useful to expand the loop in cascade of independent loops which fallback onto each other (as you have already done in utf8_scanner).

Previous message: [Python-Dev] PEP 393 decode() oddity
Next message: [Python-Dev] PEP 393 decode() oddity
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list