[Python-Dev] Unicode charmap decoders slow (original) (raw)

Walter Dörwald walter at livinglogic.de
Thu Oct 6 10:51:47 CEST 2005


Martin v. Löwis wrote:

Hye-Shik Chang wrote:

If the encoding optimization can be easily done in Walter's approach, the fastmap codec would be too expensive way for the objective because we must maintain not only fastmap but also charmap for backward compatibility. IMO, whether a new function is added or whether the existing function becomes polymorphic (depending on the type of table being passed) is a minor issue. Clearly, the charmap API needs to stay for backwards compatibility; in terms of code size or maintenance, I would actually prefer separate functions.

OK, I can update the patch accordingly. Any suggestions for the name? PyUnicode_DecodeCharmapString?

One issue apparently is people tweaking the existing dictionaries, with additional entries they think belong there. I don't think we need to preserve compatibility with that approach in 2.5, but I also think that breakage should be obvious: the dictionary should either go away completely at run-time, or be stored under a different name, so that any attempt of modifying the dictionary gives an exception instead of having no interesting effect.

IMHO it should be stored under a different name, because there are codecs (c037, koi8_r, iso8859_11), that reuse existing dictionaries.

Or we could have a function that recreates the dictionary from the string.

I envision a layout of the codec files like this:

decodingdict = ... decodingmap, encodingmap = codecs.makelookuptables(decodingdict)

Apart from the names (and the fact that encoding_map is still a dictionary), that's what my patch does.

I think it should be possible to build efficient tables in a single pass over the dictionary, so startup time should be fairly small (given that the dictionaries are currently built incrementally, anyway, due to the way dictionary literals work).

Bye, Walter Dörwald



More information about the Python-Dev mailing list