[Python-Dev] PEP 393: Special-casing ASCII-only strings (original) (raw)

Terry Reedy tjreedy at udel.edu
Thu Sep 15 20:46:11 CEST 2011


On 9/15/2011 11:50 AM, "Martin v. Löwis" wrote:

To comply with the C aliasing rules, the structures would look like this:

typedef struct { PyObjectHEAD Pyssizet length; union { void *any; PyUCS1 *latin1; PyUCS2 *ucs2; PyUCS4 *ucs4; } data; Pyhasht hash; int state; /* may include SSTATESHORTASCII flag */ wchart *wstr; } PyASCIIObject;

typedef struct { PyASCIIObject base; Pyssizet utf8length; char *utf8; Pyssizet wstrlength; } PyUnicodeObject; Code that directly accesses the structures would become more complex; code that use the accessor macros wouldn't notice. ... What do you think?

That nearly all code outside CPython itself should treat the unicode types, especially, as opaque types and only access instances through functions and macros -- the 'public' interfaces. We need to be free to fiddle with internal implementation details as experience suggests changes.

P.S. There are similar reductions that could be applied to the wstrlength in general: on 32-bit wchart systems, it could be always dropped, on a 16-bit wchart system, it could be dropped for UCS-2 strings. However, I'm not proposing these, as I think the increase in complexity is not worth the savings.

I would certainly do just the one change now and see how it goes. I think you should be free to do more like the above if you change your mind with experience.

-- Terry Jan Reedy



More information about the Python-Dev mailing list