[Python-Dev] sys.intern should work on bytes (original) (raw)

Jesus Cea jcea at jcea.es
Fri Sep 20 15:33:05 CEST 2013


-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 20/09/13 14:15, Antoine Pitrou wrote:

From http://docs.python.org/3.3/library/sys.html#sys.intern

"""sys.intern(string) Enter string in the table of “interned” strings and return the interned string [...]"""

In Python 3 context, "string" means "str".

I read that, Antoine. In fact I read the manual, I thought it was a mistake carried over from 2.x documentation, I tried it just in case before reporting the "documentation mistake", and I was surprised it was actually true :-).

I know that intern is used for performance reasons internally to the interpreter. But I am thinking about memory usage optimizations. For instance, I have a pickle that is 14MB in size, when "interning" the strings on it (there are a lot of redundancy), the new size is only 3MB and it loads faster. I can do it because most data in the pickle are strings, I could NOT do it if I used bytes.

I could do a manual "intern" for hashable objects by hand using an "object:object" dictionary (that would work for integers too), but I wonder if extending builtin "sys.intern" would be something to consider.

Anyway, this pattern is easy enough:

Instead of

object = sys.intern(object)

I could do

interned = dict() ... object = interned.setdefault(object, object)


Jesús Cea Avión // /// /// jcea at jcea.es - http://www.jcea.es/ // // // // // Twitter: @jcea // // ///// jabber / xmpp:jcea at jabber.org // // // // // "Things are not so easy" // // // // // // "My name is Dump, Core Dump" /// //_/ // // "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQCVAwUBUjxOkZlgi5GaxT1NAQIOVgQAhN36yRAAQP1YWbDsXGSamgZnhEULTloB penRZYTYz/Ir/VM9l27GoXS7ThGrucAkkYZoJqXnUr2vyP0hq6rsfp+N5lzl61Nf mDJBCtAczzKNdYqQSgMQ+Ugk43KnbEFFX7SB9Y5IkYroWCeWq7+5y6KX3ZKBspXG lmXotLgpvW0= =/RNw -----END PGP SIGNATURE-----



More information about the Python-Dev mailing list