[Python-Dev] Unicode charmap decoders slow (original) (raw)
Walter Dörwald walter at livinglogic.de
Fri Oct 14 18:26:37 CEST 2005
- Previous message: [Python-Dev] Unicode charmap decoders slow
- Next message: [Python-Dev] Unicode charmap decoders slow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
Tony Nelson wrote:
I have written my fastcharmap decoder and encoder. It's not meant to be better than the patch and other changes to come in a future version of Python, but it does work now with the current codecs. It's an interesting solution.
I like the fact that encoding doesn't need a special data structure.
To use, hook each codec to be speed up:
import fastcharmap help(fastcharmap) fastcharmap.hook('nameofcodec') u = unicode('some text', 'nameofcodec') s = u.encode('nameofcodec') No codecs were rewritten. It took me a while to learn enough to do this (Pyrex, more Python, some Python C API), and there were some surprises. Hooking in is grosser than I would have liked. I've only used it on Python 2.3 on FC3. Indeed, and I would claim that you did not completely achieve your "no changes necessary" goal: you still have to install the hooks explicitly. I also think overriding codecs.charmap{encode,decode} is really ugly. Even if this could be simplified if you would modify the existing codecs, I still don't think supporting changes to the encoding dict is worthwhile. People will probably want to update the codecs in-place, but I don't think we need to make a guarantee that that such an approach works independent of the Python version. People would be much better off writing their own codecs if they think the distributed ones are incorrect.
Exacty. If you need another codec write your own insteaad of patching an existing one on the fly!
Of course we can't accept Pyrex code in the Python core, so it would be great to rewrite the encoder as a patch to PyUnicode_EncodeCharmap(). This version must be able to cope with encoding tables that are random strings without crashing.
We've already taken care of decoding. What we still need is a new gencodec.py and regenerated codecs.
Bye, Walter Dörwald
- Previous message: [Python-Dev] Unicode charmap decoders slow
- Next message: [Python-Dev] Unicode charmap decoders slow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]