[Python-Dev] index clipping (original) (raw)

Guido van Rossum guido at python.org
Thu Aug 10 16:26:54 CEST 2006


On 8/10/06, Nick Coghlan <ncoghlan at gmail.com> wrote:

Guido van Rossum wrote: >> It seems like Nick's recent patches solved the problems that were >> identified. > > Nick, can you summarize how your patches differ from my proposal?

nbindex and index are essentially exactly as you propose.

Then I don't understand why Travis is objecting against my proposal!

I'll review the rest later (right now I'm just doing email triage :-).

--Guido

To make an object implemented in C usable as an index you would take either the nbint slot or the nblong slot and put the same function pointer into the nbindex slot. For a Python object, you would write either 'index = int' or 'index = long' as part of the class definition.

operator.index is provided to support writing getitem, setitem and delitem methods - it raises IndexError on overflow so you don't have to catch and reraise to convert an OverflowError to an IndexError. On the C API side, the 3 functions you suggest are all present (although the version returning a Python object is accessed via PyObjectCallMethod), and there's a 4th variant that raises IndexError instead of OverflowError (this version is convenient when writing mpsubscript and mpasssubscript functions). Avoiding Pyssizet -> PyInt -> Pyssizet conversions for all integer types implemented in C would be nice, but I don't think it's practical (the latest version of the patch does at least avoid it for the builtin integer types). Cheers, Nick.

P.S. Here's the detailed rationale for the form the patch has evolved to [1]: In addition to allowing (2100).index() == 2100, having nbindex return a Python object resulted in a decent reduction in code duplication - previously the coercion logic to get a Python integer or long value down to a Pyssizet was present in 3 places (longindex, instanceindex, slotnbindex), and would also have needed to be duplicated by any other C implemented index type whose value could exceed the range of a Pyssizet. With the patch, that logic appears only inside abstract.c and extension types can just return a PyLong value and let the interpreter figure out how to handle overflow. The biggest benefit of this approach is that a single slot (nbindex) can be used to implement four different overflow behaviours in the core (return PyLong, raise OverflowError, raise IndexError, clip to Pyssizet), as well as providing a hook to allow extension module authors to define their own overflow handling. If the nbindex slot does not return a true Python integer or long, TypeError gets raised. Subclasses are not accepted in order to rule out Armin's favourite set of recursion problems :) The C level API is based on the use cases in the standard library, with one of the functions generalised a bit to allow extension modules to easily handle type errors and overflow differently if they want to. The three different use cases for nbindex in the standard library are: - concrete sequence indices (want IndexError on overflow) - 'true integer' retrieval (want OverflowError on overflow) - slice endpoints (want to clip to Pyssizet max/min values) The proposed fix (Travis & Neal provided some useful comments on earlier versions) includes a C API function for each of these different use cases: PyNumberIndex(PyObject *obj, int *typeerr) PyNumberAsSsizet(PyObject *obj, int *typeerr) PyNumberAsClippedSsizet(PyObject *obj, int *typeerr, int *clipped) typeerr is an output variable to say "obj does not provide nbindex" in order to get rid of boilerplate dealing with PyErrOccurred() in mpsubscript and mpasssubscript implementations (those methods generally didn't want a TypeError raised at this point - they wanted to go on and check if the object was a slice object instead). It's also useful if you want to provide a specific error message for TypeErrors (sequence repetition takes advantage of this). You can also leave the pointer as NULL and the functions will raise a fairly generic TypeError for you. PyObjectGetItem and friends, use the functions that way. Avoiding repeated code is also why there are two non-clipping variants, one raising IndexError and one raising OverflowError. Raising OverflowError in PyNumberIndex broke half a dozen unit tests, while raising IndexError for things like sequence repetition turned out to break different unit tests. The clipping variant is for slice indices. The interpreter core doesn't actually care whether or not the result gets clipped in this case (it sets the last parameter to NULL), but I kept the output variable in the signature for the benefit of extension authors. All 3 of the C API methods return Pyssizet. The "give me a Python object" case isn't actually needed anywhere in the core, but is available to extension modules via: PyObjectCallMethod(obj, "index", NULL) As Travis notes, indexing with something other than a builtin integer will be slightly slower due to the temporary object created by calling the nbindex slot (version 4 of the patch avoids this overhead for ints, version 5 avoids it for longs as well). I don't think this is avoidable - a non-PyObject return value really doesn't provide the necessary flexibility to detect and handle overflow correctly. [1] http://www.python.org/sf/1530738 -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list