Message 93594 - Python tracker (original) (raw)

Amaury Forgeot d'Arc wrote:

Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

we should make sure that it's not possible to load an extension compiled with 3.1 in 3.2 to prevent segfaults and buffer overruns.

This is the case with this patch: today all these functions (_PyUnicode_IsAlpha, _PyUnicode_ToLowercase) are actually #defines to PyUnicodeUCS2* or PyUnicodeUCS4*. The patch removes the #defines: 3.1 modules that call _PyUnicodeUCS4_IsAlpha wouldn't load into a 3.2 interpreter.

True, but we can do better. For narrow builds, the API currently exposes the UCS2 APIs. We'd need to expose the UCS4 APIs in addition to those APIs and have the UCS2 APIs redirect to the UCS4 ones.

For wide builds, we don't need to change anything.

The change affects the Unicode type database which is implemented in unicodectype.c, not the Unicode database, which already uses UCS4.

Are you referring to the _PyUnicode_TypeRecord structure? The first three fields only contains values up to 65535, so they could use "unsigned short" even for UCS4 builds.

I haven't checked, but it's certainly possible to have a code point use a non-BMP lower/upper/title case mapping, so this should be made possible as well, if we're going to make changes to the type database.