msg49504 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-18 00:19 |
See the thread starting at http://mail.python.org/pipermail/python-dev/2006-February/061261.html This still needs unit tests and docs. |
|
|
msg49505 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2006-02-18 10:39 |
Logged In: YES user_id=1188172 Observations: * Doesn't the PyMemberDef need a sentinel? * Is PyObject_CallObject() faster than PyEval_CallFunction() with no arguments present? * The current patch gives me a segfault at interpreter exit because there is a dict object whose ma_default_factory was not initialized to NULL. |
|
|
msg49506 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-19 15:28 |
Logged In: YES user_id=6380 Are you sure you did a "make clean"? Because the dictobject struct lay-out is changed you may have to do that. If the problem persists, try adding setting ma_default_factory to NULL explicitly in dict_new (like it's already done in PyDict_New) and see if that makes a difference. BTW the change to PyDict_GetItem must be removed -- it's not a good idea. |
|
|
msg49507 - (view) |
Author: Georg Brandl (georg.brandl) *  |
Date: 2006-02-19 15:39 |
Logged In: YES user_id=1188172 Okay. I configured with "--with-pydebug" all the time, and then comes the segfault. Without "--with-pydebug", everything seems fine. |
|
|
msg49508 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-19 18:57 |
Logged In: YES user_id=6380 Aha. I'll have to try that. In the mean time, here's a new patch: - PyDict_GetItem is no longer involved - added {NULL} to PyMemberDef array - set ma_default_factory to NULL in both constructors Still no unit tests or docs. I have some misgivings about the API -- perhaps this should be a subclass. |
|
|
msg49509 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-19 19:00 |
Logged In: YES user_id=6380 Sorry, forgot the upload. Here it is. |
|
|
msg49510 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-19 23:45 |
Logged In: YES user_id=6380 Here's a version that doesn't crash in debug mode. Neal Norwitz is standing next to me and pointed out that I was attempting to decref something after tp_free had already wiped out the object. D'oh! |
|
|
msg49511 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-21 01:53 |
Logged In: YES user_id=6380 Here's a completely new version after another round of python-dev. - The built-in dict type still defines and calls on_missing(), but the default on_missing() implementation just raises KeyError(key). It no longer has a default_factory attribute. - You can subclass dict and override on_missing() to do whatever you want. - A useful subclass is collections.defaultdict; it defines an attribute default_factory and its on_missing() implementation calls that and inserts the resulting value in the dict (previous versions of the patch had this semantics in the built-in dict class, which was frowned upon). - Now with unit tests. - No docs yet, though. Assigning to Raymond Hettinger for review. Raymond, please assign it back to me for checkin if you're okay with this (or for revision if you're not :-). Because of Google's lawyers I must check this in myself. |
|
|
msg49512 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2006-02-21 07:49 |
Logged In: YES user_id=33168 In test_print() how about os.unlink(tfn)? Why do you play games with stdout, can't you do print >>f, ... ? (If you don't need stdout, it looks like you don't need sys.) In defdict_on_missing() why do you check if the factory is None? It's the only place None is checked for, everywhere else, it's just NULL. defdict_repr should be static. Type for this line in defdict_init() should be Py_ssize_t, not int: int n = PyTuple_GET_SIZE(args); |
|
|
msg49513 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2006-02-21 13:27 |
Logged In: YES user_id=80475 Neal, the defdict_on_missing check for Py_None is necessary because the user can assign None to the factory and expect it to have the same effect as NULL. The other checks for NULL all automatically handle the Py_None case the same as if the attribute were filled with a real factory. Code nits: ---------- Pre-parse: PyEval_CallMethod((PyObject *)mp, "on_missing", "(O)", key); into: PyObject_CallMethodObjArgs((PyObject *)mp, "on_missing", key, NULL); The first four lines in defdict_dealloc() can be simplified to: Py_Clear(dd->default_factory); Likewise, the first five active lines of defdict_traverse shrink to: Py_VISIT(dd->default_factory); The OFF() macro is used only once. Better to in-line it. Other ----- If you don't get a chance to add the docs, just apply the patch and I'll write the docs later. Likewise, I'll update UserDict and DictMixin to keep the APIs in-sync. |
|
|
msg49514 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2006-02-21 14:29 |
Logged In: YES user_id=80475 One other thought: The dict.copy method needs to be overriden so that default dicts can duplicate themselves. Also consider adding a __reduce__ method to support deepcopying and pickling (the latter is less useful because functions get pickled by name only). |
|
|
msg49515 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-21 15:17 |
Logged In: YES user_id=6380 Thanks both! Here's a new patch: - fixed most of Neal's nits - fixed most of Raymond's first set of nits - updated UserDict Notes: - I can't use PyObject_CallMethodObjArgs without also converting "on_missing" to an object. Right now I don't care enough to bother (especially since it is only called for proper subclasses). - I can't find Py_Clear(). Raymond, what did you mean? - It's not clear how (or even whether) to update DictMixin. - Haven't gotten to copy() yet -- good point though! - Still no docs. - In addition to Raymond's explanation, it's also the case that NULL appears as None to the user through the magic of the T_OBJECT attribute machinery. |
|
|
msg49516 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-21 15:36 |
Logged In: YES user_id=6380 Here's an update that adds: - defaultdict.copy() implemented - docstring tweaks |
|
|
msg49517 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2006-02-21 15:41 |
Logged In: YES user_id=80475 Py_CLEAR() is a macro defined in object.h. See example uses in enumobject.c, genobject.c, and itertobject.c. DictMixin should add the on_missing() method, leaving the responsibility for calling it to core object's __getitem__ method. That will likely never happen, but it does allow the mapping API to be emulated as fully as possible. Add the method right at the very end (after __len__). |
|
|
msg49518 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-21 16:24 |
Logged In: YES user_id=6380 OK. Py_CLEAR() found; DictMixin.on_missing() added. Still no docs. Do UserDict and DictMixin need unit tests additions? There are grumblings in python-dev on the name; but I still like defaultdict best. |
|
|
msg49519 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2006-02-21 21:03 |
Logged In: YES user_id=80475 I just had one other thought. If the subclass implemented __getitem__ and on_missing directly, then there would be no need to make any changes to dictobject.c and we could leave the mapping API alone (for a less disruptive implementation): class defaultdict(dict): default_factory = None def __getitem__(self, key): try: return dict.__getitem__(self, key) except KeyError: return self.on_missing(self, key) def on_missing(self, key): if self.default_factory is None: raise KeyError(key) value = self[key] = self.default_factory(key) return value Anyone needing to hook the on_missing method can do it through a subclass of defaultdict rather than dict. There are two side benefit to this approach. One, we keep the changes all in one object making it easier to understand than putting framework methods in the parent. Two, this also allows the on_missing call to be done directly instead of through PyEval_CallMethod(). That will provide a nice speed-up for the common cases where defaultdict hasn't been subclassed. |
|
|
msg49520 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-22 01:34 |
Logged In: YES user_id=6380 Here's a new version. This adds: - fix for copying functions in copy.py (see python-dev post) - more unit tests - latex docs - docstring tweaks I haven't got latex installed on my laptop so I can't test whether the docs are any good. Can anyone? I did this on the plane so your proposal below isn't incorporated yet. My comment is that I actually *like* having the functionality in the dict base class since someone else implementing on_missing() might not need the default_factory instance variable; subclassing defaultdict would be kind of wrong for that purpose. I'm not sure I understand your comment about optimizing away PyEval_CallMethod() -- don't we still need to use that so that a subclass can override it? Or do you propose to do another check for the exact defaultdict type and if so, inline it or at least shortcut the call? I'm not sure I see the value in such a micro-optimization at this point -- after all it's only called for missing keys which shouldn't be all that frequent. |
|
|
msg49521 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2006-02-22 02:00 |
Logged In: YES user_id=80475 Maybe after sleeping on the idea, you'll like it better. Otherwise, we're adding on_missing() to the dict API where it cannot possibly be called, only a subclass can do that. The essence of the patch is meeting the known regular default_factory use cases (with list, int, or set). No one has yet proposed use cases for overriding on_missing() with other functionality. So if you code it all in default_dict, you make the patch cleaner, leave the dict API unmolested, make it easier to understand the call sequence (no parent to child to parent steps), and still leave open the possibility that someone could override default_dict.on_missing() if they discover a need for defaults computed from a key. Also, I suspect that from a beginner point-of-view that having on_missing() defined in dicts will create confusion (don't call this method, just override it in a subclass, it is only triggered by getitem, there is no relation to get() or to setdefault(), ...) |
|
|
msg49522 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2006-02-22 02:12 |
Logged In: YES user_id=80475 Without LaTeX on your system, you can still do a pretty good check by running the document through: python Tools/scripts/texcheck.py collections.tex |
|
|
msg49523 - (view) |
Author: Neal Norwitz (nnorwitz) *  |
Date: 2006-02-22 06:05 |
Logged In: YES user_id=33168 LaTex is clean. |
|
|
msg49524 - (view) |
Author: Jim Jewett (jimjjewett) |
Date: 2006-02-23 01:03 |
Logged In: YES user_id=764593 Two use cases for overriding on_missing, but not needing a factory: (1) Stacked configuration objects. If it isn't in the first, check the second, etc. Override on_missing with the __getitem__ of the next configuration object. (Or perhaps a for loop over several fallbacks -- but do it in the dictionary, instead of at every lookup site.) (2) Caching. One of the problems with setdefault is that it always computes the default, which may be expensive. Note that in both of these cases, the conceptual model is not "all new entries get the same initial value", but rather "A normal dictionary, which just happens to be so expensive to create that I don't want to prepopulate." |
|
|
msg49525 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-23 21:18 |
Logged In: YES user_id=6380 Michael Chermside had a good point -- the hook should be called __missing__ (he suggested __on_missing__ but I think __missing__ is fine). Here's a new patch, which also implements and documents and tests the change that I earlier agreed to on python-dev: dict doesn't define on_missing (__missing__) but just checks for it -- if a subclass doesn't define it, the default behavior applies. __missing__ must be implemented as a method -- it can't be an instance variable. |
|
|
msg49526 - (view) |
Author: Jim Jewett (jimjjewett) |
Date: 2006-02-23 23:35 |
Logged In: YES user_id=764593 v9 updates name to __missing__, but not yet for UserDict. |
|
|
msg49527 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-24 15:55 |
Logged In: YES user_id=6380 OK, here's a new one that fixes UserDict. With unit tests. |
|
|
msg49528 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2006-02-25 22:39 |
Logged In: YES user_id=6380 Committed revision 42573. |
|
|