msg151597 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2012-01-19 00:54 |
Python identifiers are in NFKC form; string method .isidentifier() returns true on strings that are not in that form. In some contexts, these non-canonical strings will be replaced with their NFKC equivalent, but in other contexts (such as the builtins hasattr, getattr, delattr) they will not. >>> cha=chr(170) >>> cha 'ª' >>> cha.isidentifier() True >>> uc.normalize("NFKC", cha) 'a' >>> obj.ª = 5 >>> hasattr(obj, "ª") False >>> obj.a 5 |
|
|
msg151599 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2012-01-19 00:56 |
I don't see why that's invalid. "str.isidentifier()" returning True means Python will accept it as an identifier. |
|
|
msg151600 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2012-01-19 01:05 |
My preference would be for non_NFKC.isidentifier() to return False, but that may be a problem for backwards compatibility. It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead. At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1) |
|
|
msg151601 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2012-01-19 01:06 |
2012/1/18 Jim Jewett <report@bugs.python.org>: > > Jim Jewett <jimjjewett@gmail.com> added the comment: > > My preference would be for non_NFKC.isidentifier() to return False It *is* an identifier, though. Python will happily accept it. > > It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead. > > At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1) Sounds fine to me. |
|
|
msg151602 - (view) |
Author: Jim Jewett (Jim.Jewett) *  |
Date: 2012-01-19 01:07 |
@Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier. Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't. And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser. |
|
|
msg151603 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2012-01-19 01:10 |
2012/1/18 Jim Jewett <report@bugs.python.org>: > > Jim Jewett <jimjjewett@gmail.com> added the comment: > > @Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier. Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't. And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser. AFAIK, the only time it will "silently" canonicalize it for you is parsing. Even if it wasn't, you can't say it's not an identifier, it's just not normalized. |
|
|
msg296754 - (view) |
Author: Matthias Bussonnier (mbussonn) * |
Date: 2017-06-24 05:48 |
I have been bitten by that as well. I think the doc should mention to verify that the given string is normalized, not that it **should** be normalized. Agreed that If isidentifier could also possibly grow a `allow_non_nfkc=True` default parameter that would allow to deactivate internal normalisation and return False/Raise on Non NKFC that would be great. I'm also interested on having an option on ast.parse or compile to not normalize to at least be able to lint wether users are using non NFKC form, but that's another issue. I'll see if I can come up with – at least – a documentation patch. |
|
|
msg297270 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2017-06-29 14:21 |
IMO allow_non_nfkc=True that just returns False would be a bad idea, since as Benjamin points out it *is* a valid identifier, it's just not normalized (yet). Raising might work, that way you could tell the difference, but that would be a weird API for such a check function. Regardless, we should probably keep this issue to a doc patch, and open a new issue for any proposed enhancement request. And you probably want to discuss it on python-ideas first, since the underlying issue is a bit complex and the solution non-obvious, with possible knock-on effects. (Or maybe I'm wrong and the consensus will be that returning False with that flag would be fine.) |
|
|
msg297403 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2017-06-30 13:44 |
See also issue 30772 about the deeper problem. |
|
|
msg414112 - (view) |
Author: Stanley (slateny) * |
Date: 2022-02-26 18:12 |
For clarification then, would it be accurate to add a sentence like this in the documentation? "Note that isidentifier() still returns True even if the string may not be normalized." |
|
|