Issue 1530738: Fix index() clipping really big numbers (original) (raw)

Created on 2006-07-29 03:47 by ncoghlan, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
index_overflow.diff	ncoghlan,2006-07-29 03:47	Patch to raise OverflowError for big numbers passed to __index__()
pep357-fixed.diff	ncoghlan,2006-07-29 13:19	Much better solution that revises the PEP 357 C API
pep357-fix-v.2.diff	ncoghlan,2006-07-29 16:43	Further reduction in code duplication
pep357-fix-v.3.diff	ncoghlan,2006-08-01 15:28	Update to address comments from Travis on python-dev
pep357-fix-v.4.diff	ncoghlan,2006-08-05 15:10	Update to address comments from Neal
pep357-fix-v.5.diff	ncoghlan,2006-08-10 14:10	Fast track builtin longs as well as ints

Messages (11)
msg50782 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-07-29 03:47
Patch attached (index_overflow.diff) that causes __index__() to raise OverflowError for really big numbers instead of silently clipping them. The approach in the patch is a "minimal fix" that is as ugly as hell (3 different error return codes!), so I'm going to try for a cleaner version that changes nb_index to return a PyObject* (then the client code can decide whether to convert to Py_ssize_t or not, and whether to clip or raise OverflowError when doing so).
msg50783 - (view)	Author: Tim Peters (tim.peters) *	Date: 2006-07-29 04:05
Logged In: YES user_id=31435 Since I don't think this is a sane approach, I'm not going to spend time reviewing it :-) Suggest working out what's /wanted/ on python-dev first, including beefing up PEP 357 so it spells out the revised intents.
msg50784 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-07-29 13:19
Logged In: YES user_id=1038590 Attaching the patch that approaches the problem from the ground up by redesigning the PEP 357 C API to meet the needs of the actual use cases in the standard library.
msg50785 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-07-29 16:43
Logged In: YES user_id=1038590 Revised the patch to further reduce code duplication in the implementation, and to reduce the divergence from PEP 357. The functions in the C API now have the names PyNumber_Index (raises IndexError), PyNumber_AsSsize_t (raises OverflowError) and PyNumber_AsClippedSsize_t (clamps to PY_SSIZE_T_MIN/MAX), and are no longer exposed through the operator module. Python code should either use __index__() directly (if not constrained to concrete containers) or else use operator.index().
msg50786 - (view)	Author: Neal Norwitz (nnorwitz) *	Date: 2006-07-29 18:20
Logged In: YES user_id=33168 The description is no longer valid, this is not a minimal patch. I haven't thought about the actual design of the patch, but I have a bunch of issues. In no particular, order: * tests should not use the assert statement, but self.assert* * There are a bunch of formatting changes (one place with a couple of extra blank lines, other places where tabs became spaces or vica versa) * I think there might be behaviour changes wrt checking for sq_ass_item and sq_get_item and raising type_error()s. I didn't think about this hard, but just from a quick review there was a red flag. * PyNumber_Index(o, is_index) is a bogus naming. Either the API should change or is_index should change. It doesn't make sense to me that you would call an API to get the index that is not an index. * int_index became int_int, same with long, but not in other places. I don't understand the name change. I need to think more about the structure of this patch. There's something I don't like, but I'm not sure what it is without further review. More eyes would definitely help.
msg50787 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-08-01 15:28
Logged In: YES user_id=1038590 The minimal fix is still attached (the index_overflow one). That's the patch Tim didn't like. The latter half of the description covers the pep357-fix patch versions. The API's in the more comprehensive patches (_Index, _AsSsize_t, _AsClippedSsize_t) were based on minimising the boilerplate associated with actually using those functions to implement the existing operations in the standard library. Without the is_index flag, the mp_subscript implementations all need to perform their own check for whether or not the object provides __index__ or not, since they don't want to raise a TypeError unless the object is also not a slice. . However, I'm open to inverting the sense of is_index and changing the name of the output variable to typeerror (an early version of the code actually had it that way around). The only impact on the patch is a bunch of name changes to different variables. In terms of error messages, the patch definitely changes some of the exact wording of raised exceptions. It doesn't change the types, though. The specific change to PyObject_GetItem and friends is that the error message becomes "TypeError: 'foo' object is unsubscriptable" instead of the old "TypeError: 'foo' object is unindexable". The old error message can be restored if you prefer by moving the type check back after the check for whether or not the key is a valid index. However, it really didn't make a lot of sense to me to check the key before we'd even confirmed that the object actually supported subscripting. For the unit tests, I eliminated the use of assert statements in v3 of the patch (as well as updating the tests to cover a bunch of problems with the v2 patch that Travis noted, and to avoid running the same tests multiple times). int_index and long_index went away because the signature change to the nb_index slot (returning PyObject *) meant that the existing int_int and long_long functions were sufficient - there was no need to write new functions specifically for the nb_index slot. This actually applies to any extension type that implements nb_index - it should be populated with the same value as either their nb_int slot or their nb_long slot (depending on whether or not the value will always fit in a Python int) In terms of style (and what may be bothering you), I'm not aware of any existing public CPython API that supports the use of an output flag to indicate errors instead of requiring that callers check PyErr_Occurred. The straightforward approach of providing a PyNumber_IndexCheck API is certainly feasible (and has an identical line count in the mp_subscript functions), but comes at the cost of doing the check twice in every mp_subscript function (once directly and once inside PyNumber_Index). The old code didn't have this problem with doing the same check in two different places because it was merrily calling the nb_index slot directly all over the place instead of going through the abstract API.
msg50788 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-08-05 15:10
Logged In: YES user_id=1038590 Added version 4 of the patch. Changes: - is_index is now type_err, with the logical sense inverted and corresponding changes made to other parts of the patch (I didn't use type_error because that would have caused a name clash inside abstract.c) - for consistency in terminology, the internal function used for long conversion is now _PyLong_AsClippedSsize_t to match the PyNumber function, and the output variable used to report clipping is named clipped - an error message from _PyEval_SliceIndex had changed gratuitously, so I restored the original message.
msg50789 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-08-06 00:21
Logged In: YES user_id=1038590 Put the priority back to 9 (I accidentally changed it back to 7 when I attached the new patch)
msg50790 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-08-10 14:10
Logged In: YES user_id=1038590 Version 5 fast tracks builtin longs as well as ints
msg50791 - (view)	Author: Travis Oliphant (teoliphant) *	Date: 2006-08-11 10:44
Logged In: YES user_id=5296 I'm attaching a patch that fixes the problem using a more traditional C-API. Their are still 3 new C-API calls --- 1 is actually a macro * PyIndex_Check(obj) * PyNumber_Index(obj) --- is to nb_index as PyNumber_Int is to nb_int * PyNumber_AsSsize_t(obj, err) --- the second argument allows specifying the error when conversion to Pyssize_t raises an Overflow. If err == NULL, the value is clipped instead of raising any error. OverflowError and IndexError are the two errors used in Python itself and the standard library. * I kept the same unit-tests from Nick except that subclasses for integers are allowed as slice indices as this does not cause recursion.
msg50792 - (view)	Author: Alyssa Coghlan (ncoghlan) *	Date: 2006-08-11 11:42
Logged In: YES user_id=1038590 I'm withdrawing this in favour of Travis's patch #1538606 - it's much more in line with the idioms used in other parts of the C API. It isn't worth messing up the API just to avoid checking for the existing of the nb_index slot twice. With Travis's patch, it's still straightforward for extension modules to do their own overflow handling, too - call PyNumber_AsSsize_t, and if an overflow error gets triggered, call PyNumber_Index to find out whether it was positive or negative overflow.

History
Date	User	Action	Args
2022-04-11 14:56:19	admin	set	github: 43742
2006-07-29 03:47:49	ncoghlan	create