Issue 29730: unoptimal calls to PyNumber_Check (original) (raw)

Created on 2017-03-05 23:11 by Oren Milman, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
patchVer2.diff Oren Milman,2017-03-12 16:29 patch2 diff file
Pull Requests
URL Status Linked Edit
PR 622 closed Oren Milman,2017-03-11 22:23
PR 650 merged Oren Milman,2017-03-12 22:00
Messages (12)
msg289048 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-05 23:11
------------ current state ------------ if (PyNumber_Check(obj)) { someVar = PyNumber_AsSsize_t(obj, SomeError); if (someVar == -1 && PyErr_Occurred()) { return errVal; } } else { PyErr_Format(PyExc_TypeError, "integer argument expected, got '%.200s'", Py_TYPE(obj)->tp_name); return errVal; } Something similar to this happens in: - Modules/mmapmodule.c in mmap_convert_ssize_t - Modules/_io/_iomodule.c in _PyIO_ConvertSsize_t - Modules/_io/stringio.c in: * _io_StringIO_read_impl * _io_StringIO_readline_impl * _io_StringIO_truncate_impl (Moreover, in: - Objects/bytes_methods.c in parse_args_finds_byte - Objects/exceptions.c in oserror_init PyNumber_AsSsize_t is called only if PyNumber_Check returns true.) Note that: - PyNumber_Check checks whether nb_int != NULL or nb_float != NULL. - PyNumber_AsSsize_t calls PyNumber_Index, which, before calling nb_index, raises a TypeError (with a similar error message) in case nb_index == NULL. - The docs say '... when __index__() is defined __int__() should also be defined ...'. So the behavior with and without the call to PyNumber_Check is quite the same. The only potential advantage of calling PyNumber_Check is skipping the call to PyNumber_AsSsize_t. But PyNumber_AsSsize_t would be called also in case nb_index == NULL and (nb_int != NULL or nb_float != NULL). Thus, the only case in which the call to PyNumber_Check might be useful, is when nb_int == nb_float == nb_index == NULL. ------------ proposed changes ------------ Either remove each of these calls to PyNumber_Check, or at least replace it with a call to PyIndex_Check, which checks whether nb_index != NULL, and thus would be more useful than PyNumber_Check. Note that such a change shouldn't affect the behavior, except for a slightly different wording of the error message in case a TypeError is raised.
msg289051 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2017-03-05 23:29
+1 for removing each of these calls to PyNumber_Check.
msg289458 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-11 22:11
after some closer examination, ISTM that in Objects/exceptions.c, we can't remove PyNumber_Check to optimize or simplify the code, as the argument 'filename' can be either an integer type (only in case the error is a BlockingIOError), or quite anything else, except for None. We could replace PyNumber_Check(filename) with PyIndex_Check(filename), but this would change the behavior of BlockingIOError. IMHO, this isn't that important, and thus we should leave the code in Objects/exceptions.c as is. anyway, I would soon create a pull request to remove all other aforementioned calls to PyNumber_Check.
msg289483 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-12 07:41
I think it would be better to replace PyNumber_Check() with PyIndex_Check() and change error messages to more specific. In IO related functions the accepted values are integers and None. In bytes/bytearray methods -- integers and bytes-like objects.
msg289488 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-12 09:17
"In bytes/bytearray methods -- integers and bytes-like objects." (I guess you refer to parse_args_finds_byte (in Objects/bytes_methods.c)) so you suggest that in case (!PyIndex_Check(tmp_subobj)), we also check whether (PyByteArray_Check(tmp_subobj) or PyBytes_Check(tmp_subobj)), and raise an error if needed? anyway, you and Raymond suggest different things. how do we proceed?
msg289489 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-12 09:34
The call of PyNumber_Check() is redundant if we don't bother about error message. But if we want to have accurate error message we should check types before converting. In parse_args_finds_byte we should check rather PyObject_CheckBuffer() (and maybe PyBytes_Check() for fast path if this makes sense). Perhaps some methods need to check also PyTuple_Check().
msg289502 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-12 16:29
"Perhaps some methods need to check also PyTuple_Check()." which methods? anyway, a new patch diff file is attached, according to your other seggustions, Serhiy. (I ran the test module, and some tests of my own, and it seems that the patch doesn't break anything.)
msg289506 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-12 20:00
Please open a PR for review. Checking PyTuple_Check() is needed if this code is used in startswith() and endswith(). "integer argument or None expected" -- I'm not sure this wording is correct enough. What wording is used in other similar code? I'm sure there other functions that accept int or None.
msg289514 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-12 22:01
I am sorry, but I guess I am missing something about startswith() and endswith(). ISTM that PyTuple_Check() is already called by unicode_startswith, unicode_endswith and _Py_bytes_tailmatch, which already raise a TypeError with an appropriate error message (e.g. "%s first arg must be bytes or a tuple of bytes, not %s"). I searched the codebase, and found that in most places, if PyIndex_Check(obj) is true, then obj is described as 'integer' in error messages. rarely, obj would be described as 'int'. I found only two relevant TypeError messages of functions that accept an integer type or None (BTW, I took the term 'integer type' from https://docs.python.org/3.7/reference/atamodel.html?highlight=__index__#object.__index__.): - in Modules/posixmodule.c in dir_fd_converter(): PyErr_Format(PyExc_TypeError, "argument should be integer or None, not %.200s", Py_TYPE(o)->tp_name); - in Modules/_functoolsmodule.c in lru_cache_new(): PyErr_SetString(PyExc_TypeError, "maxsize should be integer or None"); so I changed the error messages in my patch to the form of 'argument should be ...' (which is, IMHO, much clearer). also, now that the new PR was created, should I close the old one?
msg289516 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-12 22:38
LGTM. Thank you Oren.
msg289532 - (view) Author: Oren Milman (Oren Milman) * Date: 2017-03-13 06:10
thanks for the reviews :)
msg290191 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-03-24 22:21
New changeset 004251059b9c5e48d47cb30b94bcb92cb44d3adf by Serhiy Storchaka (Oren Milman) in branch 'master': bpo-29730: replace some calls to PyNumber_Check and improve some error messages (#650) https://github.com/python/cpython/commit/004251059b9c5e48d47cb30b94bcb92cb44d3adf
History
Date User Action Args
2022-04-11 14:58:43 admin set github: 73916
2017-03-24 22:21:13 serhiy.storchaka set messages: +
2017-03-13 06:10:05 Oren Milman set messages: +
2017-03-12 22:39:00 serhiy.storchaka set status: open -> closedmessages: + assignee: serhiy.storchakaresolution: fixedstage: resolved
2017-03-12 22:01:23 Oren Milman set messages: +
2017-03-12 22:00:47 Oren Milman set pull_requests: + <pull%5Frequest537>
2017-03-12 20:00:20 serhiy.storchaka set messages: +
2017-03-12 16:29:57 Oren Milman set files: + patchVer2.diffkeywords: + patchmessages: +
2017-03-12 09:34:48 serhiy.storchaka set messages: +
2017-03-12 09:17:33 Oren Milman set messages: +
2017-03-12 07:41:51 serhiy.storchaka set messages: +
2017-03-11 22:23:01 Oren Milman set pull_requests: + <pull%5Frequest510>
2017-03-11 22:11:53 Oren Milman set messages: +
2017-03-06 06:05:41 serhiy.storchaka set nosy: + mark.dickinson, serhiy.storchaka
2017-03-05 23:29:25 rhettinger set nosy: + rhettingermessages: +
2017-03-05 23:11:18 Oren Milman create