[Python-Dev] Timing for removing legacy Unicode APIs deprecated by PEP 393 (original) (raw)

INADA Naoki songofacandy at gmail.com
Fri Apr 13 09:27:16 EDT 2018


Hi,

PEP 393 [1] deprecates some Unicode APIs relating to Py_UNICODE. The PEP doesn't provide schedule for removing them. But the APIs are marked "will be removed in 4.0" in the document. When removing them, we can reduce wchar_t * member of unicode object. It takes 8 bytes on 64bit platform.

[1]: "Flexible String Representation" https://www.python.org/dev/peps/pep-0393/

I thought Python 4.0 is the next version of 3.9. But Guido has different idea. He said following at Zulip chat (we're trying it for now).

No, 4.0 is not just what comes after 3.9 -- the major number change would indicate some kind of major change somewhere (like possibly the Gilectomy, which changes a lot of the C APIs). If we have more than 10 3.x versions, we'll just live with 3.10, 3.11 etc.

And he said about these APIs:

Unicode objects has some "Deprecated since version 3.3, will be removed in version 4.0" APIs (pep-393). When removing them, we can reduce PyUnicode size about 8~12byte. We should be able to deprecate these sooner by updating the docs.

Then, I want to reschedule the removal of these APIs. Can we remove them in 3.8? 3.9? or 3.10? I prefer sooner as possible.


Slightly off topic, there are 4bytes alignment gap in the unicode object, on 64bit platform.

typedef struct { .... struct { unsigned int interned:2; unsigned int kind:3; unsigned int compact:1; unsigned int ascii:1; unsigned int ready:1; unsigned int :24; } state; // 4 bytes

// implicit 4 bytes gap here.

wchar_t *wstr;  // 8 bytes

} PyASCIIObject;

So, I think we can reduce 12 bytes instead of 8 bytes when removing wstr. Or we can reduce 4 bytes soon by moving wstr before state.

Off course, it needs siphash support 4byte aligned data instead of 8byte.

Regards,

INADA Naoki <songofacandy at gmail.com>



More information about the Python-Dev mailing list