[Python-Dev] Unicode charmap decoders slow (original) (raw)

M.-A. Lemburg mal at egenix.com
Wed Oct 5 17:52:54 CEST 2005


Hye-Shik Chang wrote:

On 10/5/05, M.-A. Lemburg <mal at egenix.com> wrote:

Of course, a C version could use the same approach as the unicodedatabase module: that of compressed lookup tables...

http://aggregate.org/TechPub/lcpc2002.pdf genccodec.py anyone ?

I had written a test codec for single byte character sets to evaluate algorithms to use in CJKCodecs once before (it's not a direct implemention of you've mentioned, tough) I just ported it to unicodeobject (as attached).

Thanks. Please upload the patch to SF.

Looks like we now have to competing patches: yours and the one written by Walter.

So far you've only compared decoding strings into Unicode and they seem to be similar in performance. Do they differ in encoding performance ?

It showed relatively fine result than charmap codecs:

% python ./Lib/timeit.py -s "s='a'10241024; u=unicode(s)" "s.decode('iso8859-1')" 10 loops, best of 3: 96.7 msec per loop % ./python ./Lib/timeit.py -s "s='a'10241024; u=unicode(s)" "s.decode('iso885910fc')" 10 loops, best of 3: 22.7 msec per loop % ./python ./Lib/timeit.py -s "s='a'10241024; u=unicode(s)" "s.decode('utf-8')" 100 loops, best of 3: 18.9 msec per loop (Note that it doesn't contain any documentation nor good error handling yet. :-)

Hye-Shik

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Oct 05 2005)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the Python-Dev mailing list