[Python-Dev] Unicode charmap decoders slow (original) (raw)

M.-A. Lemburg mal at egenix.com
Thu Oct 6 11:09:51 CEST 2005

Previous message: [Python-Dev] Unicode charmap decoders slow
Next message: [Python-Dev] Unicode charmap decoders slow
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Walter Dörwald wrote:

Martin v. Löwis wrote:

Hye-Shik Chang wrote:

If the encoding optimization can be easily done in Walter's approach, the fastmap codec would be too expensive way for the objective because we must maintain not only fastmap but also charmap for backward compatibility.

IMO, whether a new function is added or whether the existing function becomes polymorphic (depending on the type of table being passed) is a minor issue. Clearly, the charmap API needs to stay for backwards compatibility; in terms of code size or maintenance, I would actually prefer separate functions. OK, I can update the patch accordingly. Any suggestions for the name? PyUnicodeDecodeCharmapString?

No, you can factor this part out into a separate C function

there's no need to add a completely new entry point just for this optimization. Later on we can then also add support for compressed tables to the codec in the same way.

One issue apparently is people tweaking the existing dictionaries, with additional entries they think belong there. I don't think we need to preserve compatibility with that approach in 2.5, but I also think that breakage should be obvious: the dictionary should either go away completely at run-time, or be stored under a different name, so that any attempt of modifying the dictionary gives an exception instead of having no interesting effect.

IMHO it should be stored under a different name, because there are codecs (c037, koi8r, iso885911), that reuse existing dictionaries.

Only koi8_u reuses the dictionary from koi8_r - and it's easy to recreate the codec from a standard mapping file.

Or we could have a function that recreates the dictionary from the string.

Actually, I'd prefer that these operations be done by the codec generator script, so that we don't have additional startup time. The dictionaries should then no longer be generated and instead. I'd like the comments to stay, though. This can be done like this (using string concatenation applied by the compiler):

decoding_charmap = ( u'x' # 0x0000 -> 0x0078 LATIN SMALL LETTER X u'y' # 0x0001 -> 0x0079 LATIN SMALL LETTER Y ... )

Either way, monkey patching the codec won't work anymore. Doesn't really matter, though, as this was never officially supported.

We've always told people to write their own codecs if they need to modify an existing one and then hook it into the system using either a new codec search function or by adding an appropriate alias.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Oct 06 2005)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Previous message: [Python-Dev] Unicode charmap decoders slow
Next message: [Python-Dev] Unicode charmap decoders slow
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list