[Python-Dev] PEP 393 Summer of Code Project (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Tue Aug 23 14:15:45 CEST 2011


Le mardi 23 août 2011 à 13:51 +0200, "Martin v. Löwis" a écrit :

> This optimization was done when trying to improve the speed of text I/O.

So what speedup did it achieve, for the kind of data you talked about?

Since I don't have the number anymore, I've just saved the contents of https://linuxfr.org/news/le-noyau-linux-est-disponible-en-version%C2%A030 as a "linuxfr.html" file and then did:

$ ./python -m timeit "with open('linuxfr.html', encoding='utf8') as f: f.read()" 1000 loops, best of 3: 859 usec per loop

After disabling the fast path, I ran the micro-benchmark again:

$ ./python -m timeit "with open('linuxfr.html', encoding='utf8') as f: f.read()" 1000 loops, best of 3: 1.09 msec per loop

so that's a 20% speedup.

> Do you have three copies of the UTF-8 decoder already, or do you a use a > stringlib-like approach?

It's a single implementation - see for yourself.

So why would you need three separate implementation of the unrolled loop? You already have a macro named WRITE_FLEXIBLE_OR_WSTR.

Even without taking into account the unrolled loop, I wonder how much slower UTF-8 decoding becomes with that approach, by the way. Instead of testing the "kind" variable at each loop iteration, using a stringlib-like approach may be a better deal IMO.

Of course we would first need to have various benchmark numbers once the current PEP 393 implementation is complete.

Regards

Antoine.



More information about the Python-Dev mailing list