Issue 5793: Rationalize isdigit / isalpha / tolower / ... uses throughout Python source (original) (raw)
Issue5793
Created on 2009-04-19 12:57 by mark.dickinson, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (5) | ||
---|---|---|
msg86170 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-04-19 12:57 |
Problem: the standard C character handling functions from ctype.h (isalpha, isdigit, isxdigit, isspace, toupper, tolower, etc.) are locale aware, but for almost all uses CPython needs locale-unaware versions of these. There are various solutions in the current source: - there's a file Include/bytes_methods.h which provides suitable ISDIGIT/ISALPHA/... macros, but also undefines the standard functions. As it is, it can't be included in Python.h since that would break 3rd party code that includes Python.h and also uses isdigit. - some files have their own solution: Python/pystrtod.c defines its own (probably inefficient) ISDIGIT and ISSPACE macros. - in some places the standard C functions are just used directly (and possibly incorrectly). A gotcha here is that one has to remember to use Py_CHARMASK to avoid errors on some platforms. (See issue 3633 for an example.) It would be nice to clean all this up, and have one central, efficient, easy-to-use set of Py_ISDIGIT/Py_ISALPHA ... locale-independent macros (or functions) that could be used safely throughout the Python source. | ||
msg86173 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2009-04-19 15:26 |
I concur. I've also been bitten by forgetting Py_CHARMASK, so a single version that took this into account (and was locale-unaware) would be welcome. In private mail I'd mentioned that if these are functions, they should take int. But I now think that's incorrect, and they should take char or unsigned char. I think the standard C functions take int because they also allow EOF. I think the Py_ versions should allow only characters and not allow EOF. Py_CHARMASK already enforces this, anyway, with likely undefined results. | ||
msg86293 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2009-04-22 12:33 |
Also, see _toupper/_tolower in Objects/stringlib/stringdef.h and Objects/stringobject.c. Those should be rationalized as well. | ||
msg86668 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2009-04-27 14:50 |
I'll implement this by adding a pyctype.h and pyctype.c, mimicking <ctype.h>. I'll essentially copy and rename the methods in bytes_methods.[ch], then change bytes_methods.h to refer to the new versions, for backward compatibility. | ||
msg86698 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2009-04-27 21:13 |
Checked in to trunk (rr72040) and py3k (r72044). Windows buildbots look okay, closing. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:56:47 | admin | set | github: 50043 |
2009-04-27 21:13:25 | eric.smith | set | status: open -> closedresolution: acceptedmessages: + |
2009-04-27 14:50:02 | eric.smith | set | assignee: eric.smithmessages: + |
2009-04-22 12:33:38 | eric.smith | set | messages: + |
2009-04-19 15:26:38 | eric.smith | set | messages: + |
2009-04-19 12:57:45 | mark.dickinson | create |