[Python-Dev] PySet API (original) (raw)

Barry Warsaw barry at python.org
Wed Mar 22 01:24:39 CET 2006


Is it your intent to push for more use of the abstract API instead of the concrete APIs for all of Python's C data structures? Current API aside, are you advocating this approach for all new built-in types? Would you argue that Python 3.0's C API be stripped of everything but the abstract API and the bare essentials of the concrete API?

If so, then I think this is extremely misguided. C is not Python, and while the abstract API is useful for some things, so is the concrete API.

In fact, the Python C API's clarity, utility, completeness, and discoverability has made Python one of the nicest languages to embed and extend, and I see no reason to deviate from that for the sake of blind TOOWTDI worship. We have a rich tradition of providing both concrete and abstract APIs at the C layer, and I think that's a good thing that we should continue here.

On Mon, 2006-03-20 at 03:44 -0500, Raymond Hettinger wrote:

PySetClear() ------------- Use PyObjectCallMethod(s, "clear", NULL).

Or if you need to save a millisecond on an O(n) operation, use PyNumberInPlaceSubtract(s,s) as shown in the docs. If the name bugs you, it only takes a one-line macro to define a wrapper. The set API should not be cluttered with unnecessary and redundant functions.

This is a great example of what I'm talking about. You lose some static C compiler checks when you use either of these alternatives. C is not Python and we shouldn't try to make it so.

The documentation is much less concise too, and if macros are encouraged, then every extension will invent their own name, further reducing readability, or use the obvious choice of PySet_Clear() and then question why Python doesn't provide this itself.

This also has a detrimental effect on debugging. Macros suck for debugging and going through all the abstract API layers sucks. A nice, clean, direct call is so much more embedder-friendly.

In addition, you essentially have all the pieces for PySet_Clear() right there in front of you, so why not expose them to embedders and make their lives easier? Forcing them to go through the abstract API or use obscure alternatives does not improve the API. It seems a false economy to not include concrete API calls just to end up back in setobject.c after layers of indirection.

PySetNext() ------------ This is also redundant. The preferred way to iterate over a set should be PyObjectGetIter(s). The iter api is generic and works for all containers. It ought to be the one-way-to-do-it.

For the C API, I disagree for the reasons stated above. In this specific case, using the iterator API actually imposes more pain on embedders because there are more things you have to keep track of and that can go wrong. PyDict_Next() is a very nice and direct API, where you often don't have to worry about reference counting (borrowed refs in this case are the right thing to return). You also don't have to worry about error conditions, and both of these things reduce bugs because it usually means less code. PySet_Next() would provide the same benefits.

I don't buy the safety argument against PyDict_Next()/PySet_Next() because they are clearly documented as requiring no modification during iteration. Again, this is what I mean by useful concrete vs. abstract APIs. When you /know/ you have a set and you /know/ you won't be modifying it, PySet_Next() is the perfect interface. If you will be modifying the set, or don't know what kind of sequence you have, then the abstract API is the right thing to use.

Further, it doesn't make sense to model this after the dictionary API where the next function is needed to avoid double lookups by returning pointers to both the key and value fields at the same time (allowing for modification of the value field). In contrast, for sets, there is no value field to look-up or mutate (the key should not be touched). So, we shouldn't be passing around pointers to the internal structure. I want to keep the internal structure of sets much more private than they were for dictionaries -- all access should be through the provided C API functions -- that keeps the implementation flexible and allows for future improvements without worrying that we've broken code for someone who has touched the internal structure directly.

The implementation of PySet_Next() would not return setentrys, it would return PyObjects. Yes, those would be borrowed refs to setentry.keys, but you still avoid direct access to internal structures.

Also, the Next() api is not as safe as the GetIter api which checks for mutation during iteration. The safety should not be tossed aside without good reason.

PySetUpdate() --------------- Use PyObjectCallMethod(s, "update", "O", iterable). That is the preferred way to access all of the high volume methods.

Again, I disagree, but I don't think I need to restate my reasons.

Only the fine grained methods (like contains, add, pop, or discard) have a need for a direct call. Adding unnecessary functions for the many-at-once methods gains you nothing -- perhaps saving a tiny O(1) look-up step in an O(n) operation.

FWIW, the same reasoning also applies to why the list API defines PyListAppend() but not PyListExtend().

Personally, I think that's a bug in the PyList C API. I haven't complained because I've rarely needed it, but it /is/ a deficiency.

PySetAsList() --------------- There is already a function expressly for this purpose, PySequenceList(s).

I'll grant you this one. ;) Forget PySet_AsList().

I'll try to answer the rest of your message without repeating myself too much. ;)

As it stands now, it is possible to use sets in C programs and access them in a way that has a direct correspondence to pure Python code -- using PyObjectCallMethod() for things we would usually access by name, using the PyNumber API for things we would access using operators, using other parts of the abstract API exactly as we would in Python (PyObjectRepr, PyObjectGetIter, PySequenceList, PyObjectPrint, etc.), and using a handful of direct access functions for the fine grained methods like (add, pop, contains, etc.). IOW, I like the way the C code looks now and I like the minimal, yet complete API. Let's don't muck it up.

This is where you and I disagree. Again, C is not Python. I actually greatly dislike having to use things like PyObject_Call() for concrete objects. First, the C code does not look like Python at all, and is actually /less/ readable because now you have to look in two places to understand what the code does. Second, it imposes much more pain when debugging because of all the extra layers you have to step through.

But of course, with a rich concrete and abstract API, as most Python types have, we both get to appease our aesthetic demons, and chose the right tool for the job.

FWIW, the C implementation in Py2.5 already provides nice speed-ups for many operations. Likewise, its memory requirements have been reduced by a third. Try to enjoy the improvements without gilding the lily.

Let's embrace C and continue to make life easier for the C coder. You can't argue that going through all the rigamarole of the iterator API would be faster than PySet_Next(), and it certainly won't be more readable or easier to debug. A foolish consistency, and all that...

Cheers, -Barry

-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 309 bytes Desc: This is a digitally signed message part Url : http://mail.python.org/pipermail/python-dev/attachments/20060321/48295e94/attachment.pgp



More information about the Python-Dev mailing list