[Python-Dev] Unicode charmap decoders slow (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Tue Oct 4 21:50:04 CEST 2005
- Previous message: [Python-Dev] Unicode charmap decoders slow
- Next message: [Python-Dev] Unicode charmap decoders slow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Walter Dörwald wrote:
For charmap decoding we might be able to use an array (e.g. a tuple (or an array.array?) of codepoints instead of dictionary.
This array would have to be sparse, of course. Using an array.array would be more efficient, I guess - but we would need a C API for arrays (to validate the type code, and to get ob_item).
Or we could implement this array as a C array (i.e. gencodec.py would generate C code).
For decoding, we would not get any better than array.array, except for startup cost.
For encoding, having a C trie might give considerable speedup. _codecs could offer an API to convert the current dictionaries into lookup-efficient structures, and the conversion would be done when importing the codec.
For the trie, two levels (higher and lower byte) would probably be sufficient: I believe most encodings only use 2 "rows" (256 code point blocks), very few more than three.
Regards, Martin
- Previous message: [Python-Dev] Unicode charmap decoders slow
- Next message: [Python-Dev] Unicode charmap decoders slow
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]