Issue 5084: unpickling does not intern attribute names (original) (raw)

Created on 2009-01-27 21:52 by jakemcguire, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
pickle.py.diff jakemcguire,2009-01-27 21:52 diff to pickle to intern attribute names
cPickle.c.diff jakemcguire,2009-02-25 21:17 take three.
pickletester.diff jakemcguire,2009-05-02 02:38 add a unit test to verify attribute names are interned
Messages (16)
msg80670 - (view) Author: Jake McGuire (jakemcguire) Date: 2009-01-27 21:52
Instance attribute names are normally interned - this is done in PyObject_SetAttr (among other places). Unpickling (in pickle and cPickle) directly updates __dict__ on the instance object. This bypasses the interning so you end up with many copies of the strings representing your attribute names, which wastes a lot of space, both in RAM and in pickles of sequences of objects created from pickles. Note that the native python memcached client uses pickle to serialize objects. >>> import pickle >>> class C(object): ... def __init__(self, x): ... self.long_attribute_name = x ... >>> len(pickle.dumps([pickle.loads(pickle.dumps(C(None), pickle.HIGHEST_PROTOCOL)) for i in range(100)], pickle.HIGHEST_PROTOCOL)) 3658 >>> len(pickle.dumps([C(None) for i in range(100)], pickle.HIGHEST_PROTOCOL)) 1441 >>> Interning the strings on unpickling makes the pickles smaller, and at least for cPickle actually makes unpickling sequences of many objects slightly faster. I have included proposed patches to cPickle.c and pickle.py, and would appreciate any feedback.
msg80678 - (view) Author: Gabriel Genellina (ggenellina) Date: 2009-01-27 23:30
Either my browser got crazy, or you uploaded the same patch (.py) twice.
msg80688 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-01-28 02:54
The patch for cPickle doesn't do the same thing as the pickle one. In the cPickle one, you are only interning slot attributes, which, I believe, is not what you intended. :-)
msg80689 - (view) Author: Jake McGuire (jakemcguire) Date: 2009-01-28 03:00
Are you sure? I may not have enough context in my diff, which I should have done against an anonymous svn checkout, but it looks like the slot attributes get set several lines after my diff. "while (PyDict_Next(slotstate, ...))" as opposed to the "while (PyDict_Next(state, ...))" in my change... -jake On Jan 27, 2009, at 6:54 PM, Alexandre Vassalotti wrote: > > Alexandre Vassalotti <alexandre@peadrop.com> added the comment: > > The patch for cPickle doesn't do the same thing as the pickle one. In > the cPickle one, you are only interning slot attributes, which, I > believe, is not what you intended. :-) > > ---------- > assignee: -> alexandre.vassalotti > nosy: +alexandre.vassalotti > > _______________________________________ > Python tracker <report@bugs.python.org> > <http://bugs.python.org/issue5084> > _______________________________________
msg80690 - (view) Author: Alexandre Vassalotti (alexandre.vassalotti) * (Python committer) Date: 2009-01-28 03:03
Oh, you are right. I was looking at py3k's version of pickle, which uses PyDict_Update instead of a while loop; that is why I got confused.
msg80918 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-01 20:53
Why do you call PyString_AsString() followed by PyString_FromString()? Strings are immutable so you shouldn't neek to take a copy. Besides, it would be nice to add a test.
msg82686 - (view) Author: Jake McGuire (jakemcguire) Date: 2009-02-24 23:32
The fromstring/asstring dance was due to my incomplete understanding of refcounting. PyDict_Next returns a borrowed reference but PyString_InternInPlace expects an owned reference. Thanks to Kirk McDonald, I have a new patch that does the refcounting correctly. What sort of test did you have in mind?
msg82690 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-25 02:03
The test should check that the unpickled strings have been interned.
msg82691 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-25 02:07
In your patch, I'm not sure where the `name` variable is coming from. Have you checked it works?
msg82692 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-02-25 02:11
To give an example of what the test could check: >>> class C(object): ... def __init__(self): ... self.some_long_attribute_name = 5 ... >>> c = C() >>> c.__dict__ {'some_long_attribute_name': 5} >>> sorted(map(id, c.__dict__)) [140371243499696] >>> import pickle >>> d = pickle.loads(pickle.dumps(c, -1)) >>> d <__main__.C object at 0x7faaba1b0390> >>> d.__dict__ {'some_long_attribute_name': 5} >>> sorted(map(id, d.__dict__)) [140371243501232] The `sorted(map(id, d.__dict__))` should have been the same before and after pickling.
msg82720 - (view) Author: Jake McGuire (jakemcguire) Date: 2009-02-25 21:17
Ugh. Clearly I didn't check to see if it worked or not, as it didn't even compile. A new diff, tested and verified to work, is attached. I'll work on creating a test.
msg86919 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-01 21:54
Jake, are you string working on a test?
msg86920 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-01 21:55
(s/string/still/, of course...)
msg86930 - (view) Author: Jake McGuire (jakemcguire) Date: 2009-05-02 02:38
This fell through the cracks. But the following unit test seems like it does the right thing (fails with the old module, works with the new ones).
msg86939 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-02 11:41
Thanks, I'll take a look very soon.
msg86982 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-05-02 21:42
Committed in r72223, r72224. Thanks!
History
Date User Action Args
2022-04-11 14:56:44 admin set github: 49334
2009-05-02 21:42:49 pitrou set status: open -> closedresolution: fixedmessages: +
2009-05-02 11:41:39 pitrou set assignee: alexandre.vassalotti -> pitroumessages: +
2009-05-02 11:40:40 pitrou set stage: needs patch -> patch review
2009-05-02 02:38:05 jakemcguire set files: + pickletester.diffmessages: +
2009-05-01 21:55:32 pitrou set messages: +
2009-05-01 21:54:58 pitrou set priority: normalstage: needs patchmessages: + versions: + Python 3.1, Python 2.7
2009-02-25 21:19:13 jakemcguire set files: - cPickle.c.diff
2009-02-25 21:19:08 jakemcguire set files: - cPickle.c.diff
2009-02-25 21:17:28 jakemcguire set files: + cPickle.c.diffmessages: +
2009-02-25 02:11:36 pitrou set messages: +
2009-02-25 02:07:01 pitrou set messages: +
2009-02-25 02:03:09 pitrou set messages: +
2009-02-24 23:32:52 jakemcguire set files: + cPickle.c.diffmessages: +
2009-02-03 02:20:50 collinwinter set nosy: + collinwinter, jyasskin
2009-02-01 20:53:38 pitrou set nosy: + pitroumessages: +
2009-01-29 10:56:14 jcea set nosy: + jcea
2009-01-28 03:03:10 alexandre.vassalotti set messages: +
2009-01-28 03:00:02 jakemcguire set messages: +
2009-01-28 02:54:53 alexandre.vassalotti set assignee: alexandre.vassalottimessages: + nosy: + alexandre.vassalotti
2009-01-27 23:39:02 jakemcguire set files: + cPickle.c.diff
2009-01-27 23:38:46 jakemcguire set files: - cPickle.c.diff
2009-01-27 23:30:05 ggenellina set nosy: + ggenellinamessages: +
2009-01-27 21:52:58 jakemcguire set files: + pickle.py.diff
2009-01-27 21:52:19 jakemcguire create