[Python-Dev] Unicode charmap decoders slow (original) (raw)
jepler@unpythonic.net jepler at unpythonic.net
Tue Oct 4 04:25:48 CEST 2005
- Previous message: [Python-Dev] Unicode charmap decoders slow
- Next message: [Python-Dev] Unicode charmap decoders slow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
As the OP suggests, decoding with a codec like mac-roman or iso8859-1 is very slow compared to encoding or decoding with utf-8. Here I'm working with 53k of data instead of 53 megs. (Note: this is a laptop, so it's possible that thermal or battery management features affected these numbers a bit, but by a factor of 3 at most)
$ timeit.py -s "s='a'531024; u=unicode(s)" "u.encode('utf-8')" 1000 loops, best of 3: 591 usec per loop $ timeit.py -s "s='a'531024; u=unicode(s)" "s.decode('utf-8')" 1000 loops, best of 3: 1.25 msec per loop $ timeit.py -s "s='a'531024; u=unicode(s)" "s.decode('mac-roman')" 100 loops, best of 3: 13.5 msec per loop $ timeit.py -s "s='a'531024; u=unicode(s)" "s.decode('iso8859-1')" 100 loops, best of 3: 13.6 msec per loop
With utf-8 encoding as the baseline, we have decode('utf-8') 2.1x as long decode('mac-roman') 22.8x as long decode('iso8859-1') 23.0x as long
Perhaps this is an area that is ripe for optimization.
Jeff -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20051003/28a511e5/attachment.pgp
- Previous message: [Python-Dev] Unicode charmap decoders slow
- Next message: [Python-Dev] Unicode charmap decoders slow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]