[Python-Dev] PEP 393 review (original) (raw)

Stefan Behnel stefan_ml at behnel.de
Fri Aug 26 07:21:11 CEST 2011


Stefan Behnel, 25.08.2011 23:30:

Stefan Behnel, 25.08.2011 20:47:

"Martin v. Löwis", 24.08.2011 20:15:

- issues to be considered (unclarities, bugs, limitations, ...)

A problem of the current implementation is the need for calling PyUnicode(FAST)READY(), and the fact that it can fail (e.g. due to insufficient memory). Basically, this means that even something as trivial as trying to get the length of a Unicode string can now result in an error. Oh, and the same applies to PyUnicodeASUNICODE() now. I doubt that there is any code out there that expects this macro to ever return NULL. This means that the current implementation has actually broken the old API. Just allocate an "80% of your memory" long string using the new API and then call PyUnicodeASUNICODE() on it to see what I mean. Sadly, a quick look at a couple of recent commits in the pep-393 branch suggested that it is not even always obvious to you as the authors which macros can be called safely and which cannot. I immediately spotted a bug in one of the updated core functions (unicoderepr, IIRC) where PyUnicodeGETLENGTH() is called without a previous call to PyUnicodeFASTREADY(). I find it everything but obvious that calling PyUnicodeDATA() and PyUnicodeKIND() is safe as long as the return value is being checked for errors, but calling PyUnicodeGETLENGTH() is not safe unless there was a previous call to PyUnicodeReady().

And, adding to my own mail yet another time, the current header file states this:

""" /* String contains only wstr byte characters. This is only possible when the string was created with a legacy API and PyUnicode_Ready() has not been called yet. Note that PyUnicode_KIND() calls PyUnicode_FAST_READY() so PyUnicode_WCHAR_KIND is only possible as a intialized value not as a result of PyUnicode_KIND(). */ #define PyUnicode_WCHAR_KIND 0 """

From my understanding, this is incorrect. When I call PyUnicode_KIND() on an old style object and it fails to allocate the string buffer, I would expect that I actually get PyUnicode_WCHAR_KIND back as a result, as the SSTATE_KIND_* value in the "state" field has not been initialised yet at that point.

Stefan



More information about the Python-Dev mailing list