Issue 13821: [doc] misleading return from isidentifier (original) (raw)

Created on 2012-01-19 00:54 by Jim.Jewett, last changed 2022-04-11 14:57 by admin.

Messages (10)
msg151597 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-01-19 00:54
Python identifiers are in NFKC form; string method .isidentifier() returns true on strings that are not in that form. In some contexts, these non-canonical strings will be replaced with their NFKC equivalent, but in other contexts (such as the builtins hasattr, getattr, delattr) they will not. >>> cha=chr(170) >>> cha 'ª' >>> cha.isidentifier() True >>> uc.normalize("NFKC", cha) 'a' >>> obj.ª = 5 >>> hasattr(obj, "ª") False >>> obj.a 5
msg151599 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-01-19 00:56
I don't see why that's invalid. "str.isidentifier()" returning True means Python will accept it as an identifier.
msg151600 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-01-19 01:05
My preference would be for non_NFKC.isidentifier() to return False, but that may be a problem for backwards compatibility. It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead. At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1)
msg151601 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-01-19 01:06
2012/1/18 Jim Jewett <report@bugs.python.org>: > > Jim Jewett <jimjjewett@gmail.com> added the comment: > > My preference would be for non_NFKC.isidentifier() to return False It *is* an identifier, though. Python will happily accept it. > > It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead. > > At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1) Sounds fine to me.
msg151602 - (view) Author: Jim Jewett (Jim.Jewett) * (Python triager) Date: 2012-01-19 01:07
@Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier. Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't. And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser.
msg151603 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2012-01-19 01:10
2012/1/18 Jim Jewett <report@bugs.python.org>: > > Jim Jewett <jimjjewett@gmail.com> added the comment: > > @Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier.  Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't.  And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser. AFAIK, the only time it will "silently" canonicalize it for you is parsing. Even if it wasn't, you can't say it's not an identifier, it's just not normalized.
msg296754 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2017-06-24 05:48
I have been bitten by that as well. I think the doc should mention to verify that the given string is normalized, not that it **should** be normalized. Agreed that If isidentifier could also possibly grow a `allow_non_nfkc=True` default parameter that would allow to deactivate internal normalisation and return False/Raise on Non NKFC that would be great. I'm also interested on having an option on ast.parse or compile to not normalize to at least be able to lint wether users are using non NFKC form, but that's another issue. I'll see if I can come up with – at least – a documentation patch.
msg297270 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-29 14:21
IMO allow_non_nfkc=True that just returns False would be a bad idea, since as Benjamin points out it *is* a valid identifier, it's just not normalized (yet). Raising might work, that way you could tell the difference, but that would be a weird API for such a check function. Regardless, we should probably keep this issue to a doc patch, and open a new issue for any proposed enhancement request. And you probably want to discuss it on python-ideas first, since the underlying issue is a bit complex and the solution non-obvious, with possible knock-on effects. (Or maybe I'm wrong and the consensus will be that returning False with that flag would be fine.)
msg297403 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-06-30 13:44
See also issue 30772 about the deeper problem.
msg414112 - (view) Author: Stanley (slateny) * Date: 2022-02-26 18:12
For clarification then, would it be accurate to add a sentence like this in the documentation? "Note that isidentifier() still returns True even if the string may not be normalized."
History
Date User Action Args
2022-04-11 14:57:25 admin set github: 58029
2022-02-26 18:12:37 slateny set nosy: + slatenymessages: +
2021-12-05 23:01:22 iritkatriel set keywords: + easytitle: misleading return from isidentifier -> [doc] misleading return from isidentifierversions: + Python 3.9, Python 3.10, Python 3.11, - Python 3.4, Python 3.5, Python 3.6
2017-06-30 13:44:23 r.david.murray set messages: +
2017-06-29 14:38:49 vstinner set nosy: - vstinner
2017-06-29 14:21:06 r.david.murray set nosy: + r.david.murraymessages: +
2017-06-24 05:48:27 mbussonn set nosy: + mbussonnmessages: +
2015-11-16 15:30:02 rhettinger set nosy: + pitrou
2015-11-16 14:29:08 serhiy.storchaka set nosy: + serhiy.storchaka, docs@python, vstinnerversions: + Python 3.4, Python 3.5, Python 3.6assignee: docs@pythoncomponents: + Documentationtype: behaviorstage: needs patch
2012-01-19 01:10:03 benjamin.peterson set messages: +
2012-01-19 01:07:54 Jim.Jewett set messages: +
2012-01-19 01:06:30 benjamin.peterson set messages: +
2012-01-19 01:05:12 Jim.Jewett set messages: +
2012-01-19 00:56:33 benjamin.peterson set nosy: + benjamin.petersonmessages: +
2012-01-19 00:54:19 Jim.Jewett create