[Python-Dev] import screwiness (original) (raw)

Tim Peters tim.peters at gmail.com
Thu Jul 6 06:53:54 CEST 2006


[Neal Norwitz]

In import.c starting around line 1210 (I removed a bunch of code that doesn't matter for the problem):

if (PyUnicodeCheck(v)) { copy = PyUnicodeEncode(PyUnicodeASUNICODE(v), PyUnicodeGETSIZE(v), PyFileSystemDefaultEncoding, NULL); v = copy; } len = PyStringGETSIZE(v); if (len + 2 + namelen + MAXSUFFIXSIZE >= buflen) { PyXDECREF(copy); continue; /* Too long */ } strcpy(buf, PyStringASSTRING(v)); *** So if v is originally unicode, then copy is unicode from the second line, right?

No. An encoded unicode string is of type str, and PyUnicode_Encode() returns an encoded string. Like so:

u"\u1122".encode('utf-8') '\xe1\x84\xa2' type() <type 'str'>

Then we assign v to copy, so v is still unicode.

Almost ;-)

Then later on we do PyStringGETSIZE and PyStringASSTRING. That doesn't work, does it? What am I missing?

The conceptual type of the object returned by PyUnicode_Encode().



More information about the Python-Dev mailing list