[Python-Dev] Rethinking intern() and its data structure (original) (raw)

"Martin v. Löwis" martin at v.loewis.de
Thu Apr 9 21:06:40 CEST 2009


So I guess some of it comes down to whether "loweis" would also reject this change on the basis that mathematically a "set is not a dict".

I'd like to point out that this was not the reason to reject it. Instead, this (or, the opposite of it) was given as a reason why this patch should be accepted (in msg50482). I found that a weak rationale for making that change, in particular because I think the rationale is incorrect.

I like your rationale (save memory) much more, and was asking in the tracker for specific numbers, which weren't forthcoming.

Though given that his claim "nobody else is speaking in favor of the patch", while at least Colin Winter has expressed some interest at this point.

Again, at that point in the tracker, none of the other committers had spoken in favor of the patch. Since I wasn't convinced of its correctness, and nobody else (whom I trust) had reviewed it as correct, I rejected it.

Now that you brought up a specific numbers, I tried to verify them, and found them correct (although a bit unfortunate), please see my test script below. Up to 21800 interned strings, the dict takes (only) 384kiB. It then grows, requiring 1536kiB. Whether or not having 22k interned strings is "typical", I still don't know.

Wrt. your proposed change, I would be worried about maintainability, in particular if it would copy parts of the set implementation.

Regards, Martin

import gc, sys def find_interned_dict(): cand = None for o in gc.get_objects(): if not isinstance(o, dict): continue if "find_interned_dict" not in o: continue for k,v in o.iteritems(): if k is not v: break else: assert not cand cand = o return cand

d = find_interned_dict() print len(d), sys.getsizeof(d)

l = [] for i in range(20000): if i%100==0: print len(d), sys.getsizeof(d)

l.append(intern(repr(i)))


More information about the Python-Dev mailing list