[Python-Dev] PySet API (original) (raw)
Raymond Hettinger raymond.hettinger at verizon.net
Wed Mar 22 03:31:13 CET 2006
- Previous message: [Python-Dev] PySet API
- Next message: [Python-Dev] PySet API
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Barry]
Is it your intent to push for more use of the abstract API instead of the concrete APIs for all of Python's C data structures? Current API aside, are you advocating this approach for all new built-in types? Would you argue that Python 3.0's C API be stripped of everything but the abstract API and the bare essentials of the concrete API?
It's not up to me. Perhaps someone else can chime-in about the philosophy of how the C API is supposed to balance abstract and concrete APIs.
For concrete APIs, it does make sense to have methods for fine-grained access to containers (for the O(1) ops that tend to appear inside loops).
I know that the more one uses the abstract API, the more likely the code is going to be able to accept duck typed inputs. Also, most things that have tp_slots have a corresponding abstract method instead of tons a concrete access points; hence, I would be supportive if you proposed a PyObject_Clear(o) function (for calling tp_clear slots when they exist and returning an error code when they don't).
For setobject.c, if I still have a say in the matter, my strong preference is to keep the API minimal, expose fine-grained functions for efficiency, use PyNumber methods for direct access to operator style set operations, and use the abstract API for everything else.
Though you have a different world view about fat vs thin APIs and on whether the C API should track the Python API, perhaps you can agree with my reasoning in this case. There is a semantic difference between code like s+=t and s.update(t). The former only works when t is a set and the latter works for any iterable. When the C code corresponds to the Python code, that knowledge is kept intact and there is no confusion between PyNumber_InPlaceAdd(s,t) vs PyObject_CallMethod(s, "update", "(O)", t). I did not want to fill-up the C API with methods corresponding to every named method in the Python API -- besides making the API unnecessarily fat, IMO, it introduces ambiguity about whether PySet_Update would be the set-only operation or the generator iteratable version.
With respect to clear(), I do not not want to further enshrine that method. IMO, most uses of it are misguided and the code would be better-off allowing the existing set to be decreffed away and using PySet_New() instead. I realize that is a matter of taste, but it is a basic of API design that the module author provide clues about how the module is intended to be used.
With respect to PySet_Next(), I do agree that it is both a bit faster and easier to use than the iterator API. That is more of a reflection of issues with the iterator API than it is an indication that PySet_Next() would be worthwhile.
That being said, I stand by my rejection of that method. For dictionaries, the problems with the approach were some offset by the advantages of getting a simultaneous key/value lookup and by direct access to the memory where the value was stored. Sets, of course, do not have these offseting benefits.
The problems with the approach however do apply to sets. I do not want to be passing around pointers to the private internal hash table -- programs have no business altering that memory directly. You have to be careful with even simple cases such as using PyString_AS_STRING which returns a pointer and then the underlying object can get decreffed away leaving a pointer to an invalid memory location (that happened to me recently but I caught it during code review).
Also, there is the safety issue of having the table mutate during iteration. While your note downplayed the issue noting that the risks are fully disclosed, I speak from experience in saying that it is all too easy to accidentally allow the table to mutate during iteration -- I learned this the hard-way and had to undo a number of my uses of set_next() which had been used for speed and for being simpler than the iteator api. A single decref or call to a key's PyObject_Hash() is enough to trigger arbitrary Python code running during the middle of iteration, possibly resulting in a resize or value mutation. Take a look at the code for set_clear() to see the dance required to avoid this risk.
IOW, the safety considerations alone preclude PySet_Next(). Instead, use PyObject_GetIter() and enjoy the benefits of safety, duck typing, and re-usable code.
Because of the safety issues and passing internal pointers, I prefer that the _Next() api NOT get replicated throughout the language. It was needed for dicts, but shouldn't start sprouting up everywhere.
All that being said, I understand that some of it comes down to taste and philosophical differences. As the module author, the existing API of course reflects my tastes and world-view. I have given you simple, one line alternatives to each proposal and listed the reasons for each choice, so it can't be argued that the API is somehow crippling your work.
I'm sympathethic to your reluctance to use PyObject_CallMethod(). But, do understand that it is simply an aversion. It works just fine and makes no speed difference on coarse-grained, O(n) methods. I like its clarity and direct correspondence to the pure Python api for sets. Because the methods in question would all be METH_O or METH_NOARGS, the only static type checking you've lost is verifying the number of arguments. I would suggest that if you have the wrong number of arguments for s.update(t) or s.clear(), then you have problems a C API can't solve ;-)
Cheers,
Raymond
P.S. One other thought: I don't want to crystalize the API in a way that precludes future development of the module. One possibility for the future is for updates to take multiple arguments such as s.update(t,u,v) causing three updates to be folded-in at once.
- Previous message: [Python-Dev] PySet API
- Next message: [Python-Dev] PySet API
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]