[Python-Dev] Interning string subtype instances (original) (raw)

Hrvoje Nikšić hrvoje.niksic at avl.com
Mon Feb 12 18:21:48 CET 2007


I propose modifying PyString_InternInPlace to better cope with string subtype instances.

Current implementation of PyString_InternInPlace does nothing and returns if passed an instance of a subtype of PyString_Type. This is a problem for applications that need to support string subtypes, but also must intern the strings for faster equivalence testing. Such an application, upon receiving a string subtype, will silently fail to work.

There is good reason for PyString_InternInPlace not accepting string subtypes: since a subtype can have modified behavior, interning it can cause problems for other users of the interned string. I agree with the reasoning, but propose a different solution: when interning an instance of a string subtype, PyString_InternInPlace could simply intern a copy.

This should be a fully backward compatible change because: 1) code that passes PyString instances (99.99% cases) will work as before, and 2) code that passes something else silently failed to intern the string anyway. Speed should be exactly the same as before, with the added benefit that interning PyString subtype instances now does something, but without the problems that interning the actual instance can produce.

The patch could look like this. If there is interest in this, I can produce a complete patch.

@@ -5,10 +5,6 @@ PyObject t; if (s == NULL || !PyString_Check(s)) Py_FatalError("PyString_InternInPlace: strings only please!"); - / If it's a string subclass, we don't really know what putting - it in the interned dict might do. */ - if (!PyString_CheckExact(s)) - return; if (PyString_CHECK_INTERNED(s)) return; if (interned == NULL) { @@ -25,6 +21,18 @@ p = t; return; } + / Make sure we don't intern a string subclass, since we don't



More information about the Python-Dev mailing list