[Python-Dev] PEP 393 Summer of Code Project (original) (raw)
Stefan Behnel stefan_ml at behnel.de
Tue Aug 23 14:14:39 CEST 2011
- Previous message: [Python-Dev] PEP 393 Summer of Code Project
- Next message: [Python-Dev] PEP 393 Summer of Code Project
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Torsten Becker, 22.08.2011 20:58:
I have implemented an initial version of PEP 393 -- "Flexible String Representation" as part of my Google Summer of Code project. My patch is hosted as a repository on bitbucket [1] and I created a related issue on the bug tracker [2]. I posted documentation for the current state of the development in the wiki [3].
One thing that occurred to me regarding the object struct:
typedef struct { PyObject_HEAD Py_ssize_t length; /* Number of code points in the string / void str; / Canonical, smallest-form Unicode buffer / Py_hash_t hash; / Hash value; -1 if not set / int state; / != 0 if interned. In this case the two * references from the dictionary to this * object are not counted in ob_refcnt. * See SSTATE_KIND_ for other bits / Py_ssize_t utf8_length; / Number of bytes in utf8, excluding the * terminating \0. */ char utf8; / UTF-8 representation (null-terminated) / Py_ssize_t wstr_length; / Number of code points in wstr, possible * surrogates count as two code points. */ wchar_t wstr; / wchar_t representation (null-terminated) */ } PyUnicodeObject;
Wouldn't the "normal" approach be to use a union for the str field? I.e.
union str {
unsigned char* latin1;
Py_UCS2* ucs2;
Py_UCS4* ucs4;
}
Given that they're all pointers, all fields have the same size, but I find it more readable to write
u.str.latin1
than
((const unsigned char*)u.str)
Plus, the three types would be given by the struct, rather than by a per-usage cast.
Has this been considered before? Was there a reason to decide against it?
Stefan
- Previous message: [Python-Dev] PEP 393 Summer of Code Project
- Next message: [Python-Dev] PEP 393 Summer of Code Project
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]