Issue 22939: integer overflow in iterator object (original) (raw)

Created on 2014-11-25 10:56 by hakril, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue22939.patch hakril,2014-12-06 20:08 First try for the patch review
issue22939v2.patch hakril,2014-12-07 21:26 Second try for the patch review
Messages (24)
msg231651 - (view) Author: Clement Rouault (hakril) * Date: 2014-11-25 10:56
I found an interger overflow in the standard iterator object (Objects/iterobject.c) The `it_index` attribute used by the iterator is a `Py_ssize_t` but overflow is never checked. So after the index `PY_SSIZE_T_MAX`, the iterator object will ask for the index `PY_SSIZE_T_MIN`. Here is an example: import sys class Seq: def __getitem__(self, item): print("[-] Asked for item at <{0}>".format(item)) return 0 s = Seq() i = iter(s) # Manually set `it_index` to PY_SSIZE_T_MAX without a loop i.__setstate__(sys.maxsize) next(i) [-] Asked for item at <9223372036854775807> next(i) [-] Asked for item at <-9223372036854775808> I would be really interested in writing a patch but first I wanted to discuss the expected behaviour and fix. The iterator could stop after `PY_SSIZE_T_MAX` but it seems strange as other iterator (like `enumobject`) handle values bigger than `PY_SSIZE_T_MAX`. Or the same technique used in `enumobject` could be used: adding a field `PyObject* en_longindex` (a python long) which would be used for values bigger than `PY_SSIZE_T_MAX`
msg231659 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2014-11-25 13:51
For possibly relevant background information, see issue 21444 and the issues linked from it, and issue 14794.
msg231660 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-11-25 14:23
> The `it_index` attribute used by the iterator is a `Py_ssize_t` but overflow is never checked. Yes it is a bug. iter_iternext() must raises an OverflowError if it->it_index is equal to PY_SSIZE_T_MAX.
msg231664 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 15:22
I think OverflowError is good for maintained releases, but for 3.5 Clement's idea with long index looks attractive to me. In any case an exception should be raised for negative argument in __setstate__(). Let split this issue on two parts. First fix the bug by raising exceptions, and then add long index (if anyone will need it).
msg231669 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-11-25 16:40
> I think OverflowError is good for maintained releases, but for 3.5 Clement's idea with long index looks attractive to me. What is your use case?
msg231676 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-11-25 18:21
> What is your use case? Something like range(). I agree that it is very unlike to encounter this problem in real work, and we can live with implementation-related limitation (for which OverflowError exists).
msg232250 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-06 20:08
Here is a first try for a patch. There are two points I am not sure about: 1) The message for the OverflowError: is that explicit enough ? 2) The behaviour of the iterator after the raise of OverflowError. With this patch every call to `next(it)` where `it` have overflowed will raise `OverflowError` again. Does this behaviour seems correct our should it raise StopIteration after the first OverflowError ?
msg232267 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-12-07 08:44
You should not rely on undefined behaviour: check if the index is greater or equal than PY_SSIZET_MAX - 1. __setstate__ must implement the same check. You must always raise an error, not hide it in a second call to next (raise StopIteration). Your unit test must check this behaviour.
msg232269 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-07 09:20
> check if the index is greater or equal than PY_SSIZET_MAX - 1. Just PY_SSIZET_MAX. Added other comments on Rietveld (look the email in the Spam folder).
msg232282 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-07 21:26
Thanks for the comments. I corrected the patch and updated the test. I also added a test that check the behavior of setstate with negative indice. Just one question: > __setstate__ must implement the same check. Why should __setstate__ check for PY_SSIZE_T_MAX (throwing OverflowError when unpickling) if the check will be done when calling next on the resulting iterator ?
msg232284 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-07 21:31
__setstate__ should check that an index is not negative. All values from 0 to PY_SSIZE_T_MAX are acceptable.
msg232285 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-07 21:39
Ah, __setstate__ already handles negative indices. Then the patch LGTM.
msg232362 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-09 10:32
I would think that the PY_SSIZE_T_MAX check belongs inside the: if (result != NULL) { it->it_index++; return result; } just before the increment which could cause the overflow. Also, PY_SSIZE_T_MAX is a valid value to pass to PySequence_GetItem(), so it shouldn't be blocked unless necessary.
msg232365 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-12-09 10:41
This doesn't matter because next() will raise an exception if it_index == PY_SSIZE_T_MAX in any case. The code should be changed much more to allow yielding an item with index PY_SSIZE_T_MAX, use other (negative) signal value and change the behavior of __setstate__ for negative values.
msg232367 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-09 10:46
> Also, PY_SSIZE_T_MAX is a valid value to pass to PySequence_GetItem(), so it shouldn't be blocked unless necessary. I agree with you, that's why my first path was checking at the next call if it->it_index had overflowed. But then it relies on undefined behaviour. > I would think that the PY_SSIZE_T_MAX check belongs inside the: > > if (result != NULL) { > it->it_index++; > return result; > } If we raise the OverflowError when it->it_index really overflow (just after the getitem PY_SSIZE_T_MAX). Is it really necessary to do the overflow check after the GetItem ? because the value returned by `PySequence_GetItem(seq, PY_SSIZE_T_MAX);` will be never used.
msg232368 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2014-12-09 11:22
I prefer to raise an OverflowError *before* calling PySequence_GetItem(). The call to PySequence_GetItem() may be expensive, and we have to drop the result if an OverflowError is raised after the call. At the end, the behaviour is the same: an OverflowError is raised.
msg232385 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2014-12-09 16:03
> The call to PySequence_GetItem() may be expensive, and we have to drop > the result if an OverflowError is raised after the call. You do realize that this error will be very rare and therefore inconsequential.
msg233021 - (view) Author: Clement Rouault (hakril) * Date: 2014-12-22 15:36
> > The call to PySequence_GetItem() may be expensive, and we have to drop > > the result if an OverflowError is raised after the call. > You do realize that this error will be very rare and therefore inconsequential. The real question is: why would you call the iterator for a new value if it will be discarded anyway ? I think it could be very misleading to see _getitem__ being called and have an OverflowError being raised afterward.
msg233203 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2014-12-30 11:50
Is it necessary to raise when it_index is PY_SSIZE_T_MAX? An alternative is to set it_index to -1 when there would be overflow and raise an exception on the next call to next(). That way a virtual sequence with PY_SSIZE_T_MAX-1 items would still work (instead of failing unexpectedly).
msg234233 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-01-18 09:56
> That way a virtual sequence with PY_SSIZE_T_MAX-1 items would still work (instead of failing unexpectedly). Actually with PY_SSIZE_T_MAX+1 items (indices from 0 to PY_SSIZE_T_MAX inclusive). If Raymond insists I can write more complicated patch, but I don't think that we should complicate the code for this pretty hypotetical case. I'm for committing issue22939v2.patch.
msg242960 - (view) Author: Clement Rouault (hakril) * Date: 2015-05-12 11:02
After a few months, I am back to you on this issue. What should be the next step of the process ?
msg243568 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-19 09:17
If Raymond will not stop me, I'll commit the patch tomorrow.
msg243769 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2015-05-21 17:55
New changeset d6179accca20 by Serhiy Storchaka in branch '2.7': Fixed issue number for issue #22939. https://hg.python.org/cpython/rev/d6179accca20 New changeset 7fa2f4afcf5a by Serhiy Storchaka in branch '3.4': Fixed issue number for issue #22939. https://hg.python.org/cpython/rev/7fa2f4afcf5a New changeset f23533fa6afa by Serhiy Storchaka in branch 'default': Fixed issue number for issue #22939. https://hg.python.org/cpython/rev/f23533fa6afa
msg243771 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-05-21 18:00
Thank you for your contribution Clement.
History
Date User Action Args
2022-04-11 14:58:10 admin set github: 67128
2015-05-21 18:00:05 serhiy.storchaka set status: open -> closedresolution: fixedmessages: + stage: commit review -> resolved
2015-05-21 17:55:28 python-dev set nosy: + python-devmessages: +
2015-05-19 09:17:32 serhiy.storchaka set assignee: serhiy.storchakamessages: +
2015-05-16 15:28:30 serhiy.storchaka set versions: + Python 2.7, Python 3.4
2015-05-12 11:02:38 hakril set status: pending -> openmessages: +
2015-01-18 09:56:24 serhiy.storchaka set status: open -> pendingmessages: +
2014-12-30 11:50:39 ronaldoussoren set nosy: + ronaldoussorenmessages: +
2014-12-22 15:36:13 hakril set messages: +
2014-12-09 16:03:24 rhettinger set messages: +
2014-12-09 11:22:02 vstinner set messages: +
2014-12-09 10:46:39 hakril set messages: +
2014-12-09 10:41:19 serhiy.storchaka set messages: +
2014-12-09 10:32:32 rhettinger set messages: +
2014-12-07 21:39:01 serhiy.storchaka set messages: + stage: needs patch -> commit review
2014-12-07 21:31:54 serhiy.storchaka set messages: +
2014-12-07 21:26:59 hakril set files: + issue22939v2.patchmessages: +
2014-12-07 09:20:21 serhiy.storchaka set messages: +
2014-12-07 08:44:12 vstinner set messages: +
2014-12-06 20:08:16 hakril set files: + issue22939.patchkeywords: + patchmessages: +
2014-11-25 18:21:29 serhiy.storchaka set messages: +
2014-11-25 16:40:32 vstinner set messages: +
2014-11-25 15:22:08 serhiy.storchaka set nosy: + mark.dickinsonmessages: +
2014-11-25 14:23:33 vstinner set nosy: + vstinnermessages: +
2014-11-25 13:51:23 r.david.murray set nosy: + r.david.murraymessages: +
2014-11-25 13:14:40 serhiy.storchaka set nosy: + rhettinger, serhiy.storchakastage: needs patch
2014-11-25 10:56:33 hakril create