[Python-Dev] Interning string subtype instances (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Tue Feb 13 11🔞32 CET 2007


Hrvoje Nikšić schrieb:

Another reason is that Python's interning mechanism is much better than such a simple implementation: it stores the interned state directly in the PyStringObject structure, so you can find out that a string is already interned without looking it up in the dictionary. This information can (and is) used by both Python core and by C extensions. Another advantage is that, as of recently, interned strings can be garbage collected, which is typically not true of simple replacements (although it could probably be emulated by using weak references, it's not trivial.)

OTOH, in an application that needs unique strings, you normally know what the scope is (i.e. where the strings come from, and when they aren't used anymore).

For example, in XML parsing, pyexpat supports an interning dictionary. It puts all element and attribute names into (but not element content, which typically isn't likely to be repeated). It starts with a fresh dictionary before parsing starts, and releases the dictionary when parsing is done.

Regards, Martin



More information about the Python-Dev mailing list