Issue 32982: Parse out invisible Unicode characters? (original) (raw)

Created on 2018-03-02 06:04 by leewz, last changed 2022-04-11 14:58 by admin.

Messages (4)
msg313127 - (view) Author: Franklin? Lee (leewz) Date: 2018-03-02 06:04
The following line should have a character that trips up the compiler. ‎indices = range(5) The character is \u200e, and was inserted by Google Keep. (I've already reported the issue to Google as a regression.) Here's the error message: """ File "", line 3 ‎indices = range(5) ^ SyntaxError: invalid character in identifier """ Depending on the terminal or editor, it may not be possible to tell the problem just from looking. Without knowledge/experience of Unicode, it may not be possible to figure out the problem at all. Since Python source now uses Unicode by default, should certain invisible characters be stripped out during compilation?
msg313155 - (view) Author: Matthew Barnett (mrabarnett) * (Python triager) Date: 2018-03-02 19:10
For the record, '\u200e' is '\N{LEFT-TO-RIGHT MARK}'.
msg313159 - (view) Author: Glenn Linderman (v+python) * Date: 2018-03-02 19:46
Characters should not be stripped during compilation. But I can see where it might be helpful if the codepoint of the character, and the printed form just in case it is printable, could helpfully be included in the error message, as well as having the ^ pointer pointing to it.
msg313629 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2018-03-12 00:42
I think it sounds like a good idea to put the printed representation as a repered string, followed by the code point representation in parenthesis, in that message after "invalid character".
History
Date User Action Args
2022-04-11 14:58:58 admin set github: 77163
2018-03-12 00:42:44 r.david.murray set nosy: + r.david.murraymessages: +
2018-03-02 19:46:14 v+python set nosy: + v+pythonmessages: +
2018-03-02 19:10:57 mrabarnett set nosy: + mrabarnettmessages: +
2018-03-02 06:04:50 leewz create