[Python-Dev] PEP 393: Special-casing ASCII-only strings (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Fri Sep 16 00:42:25 CEST 2011


On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. Löwis" <martin at v.loewis.de> wrote:

Thinking about this, the following may work: - ASCIIObject: state, length, hash, wstr*, data follow - SingleBlockUnicode: ASCIIObject, wstrlen,  utf8*, utf8len, data follow - UnicodeObject: SingleBlockUnicode, data pointer, no data follow

This is essentially your proposal, except that the wstrlen is dropped for ASCII strings, and that it uses nested structs. The single-block variants would always be "ready", the full unicode object is ready only if the data pointer is set.

In your "UnicodeObject" here, is the 'data pointer' the any/latin1/ucs2/ucs4 union from the original structure definition?

Also, what are the constraints on the "SingleBlockUnicode"? Does it only hold strings that can be represented in latin1? Or can the size of the individual elements be more than 1 byte?

Cheers, Nick.

-- Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-Dev mailing list