[Python-Dev] PEP 393: Special-casing ASCII-only strings (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Fri Sep 16 07:41:21 CEST 2011
- Previous message: [Python-Dev] PEP 393: Special-casing ASCII-only strings
- Next message: [Python-Dev] Meta coding in Python
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Am 16.09.11 00:42, schrieb Nick Coghlan:
On Fri, Sep 16, 2011 at 7:39 AM, "Martin v. Löwis <martin at v.loewis.de> wrote:
Thinking about this, the following may work:
- ASCIIObject: state, length, hash, wstr*, data follow - SingleBlockUnicode: ASCIIObject, wstrlen, utf8*, utf8len, data follow - UnicodeObject: SingleBlockUnicode, data pointer, no data follow This is essentially your proposal, except that the wstrlen is dropped for ASCII strings, and that it uses nested structs. The single-block variants would always be "ready", the full unicode object is ready only if the data pointer is set. In your "UnicodeObject" here, is the 'data pointer' the any/latin1/ucs2/ucs4 union from the original structure definition?
Yes, it is. I'm considering dropping the union again, since you'll have to cast the data pointer anyway in the compact cases.
Also, what are the constraints on the "SingleBlockUnicode"? Does it only hold strings that can be represented in latin1? Or can the size of the individual elements be more than 1 byte?
Any size - what matters is whether the maximum character is known at creation time (i.e. whether you've used PyUnicode_New(size, maxchar) or PyUnicode_FromUnicode(NULL, size)). In the latter case, a Py_UNICODE block will be allocated in wstr, and the data pointer left NULL. Then, when PyUnicode_Ready is called, the maxmimum character is determined in the Py_UNICODE block, and a new data block allocated - but that will have to be a second memory block (the Py_UNICODE block is then dropped in _Ready).
Regards, Martin
- Previous message: [Python-Dev] PEP 393: Special-casing ASCII-only strings
- Next message: [Python-Dev] Meta coding in Python
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]