I am getting "unknown parsing error" when trying to run a script with a following first line: #@+leo-encoding=cp1251. If I add a couple of empty lines or # -*- coding: cp1251 -*- then everything is ok. I am using ActiveState python 2.3.3 on Win2K server. ---------- Python ---------- error=22 File "test.py", line 1 SyntaxError: unknown parsing error Output completed (0 sec consumed) - Normal Termination ------------------------------ #@+leo-encoding=cp1251. #@+node:0::@file test.py #@+body for i in range(5): print i #@-body #@-node:0::@file test.py #@-leo
Logged In: YES user_id=33168 Martin, I hope you don't mind me assigning this to you. I think you implemented the coding spec. I briefly read the PEP and while the code does what the PEP states (ie, use a regex), the behaviour doesn't match the examples. It also seems like it could be error prone to allow r'#.*coding[:=]' I think there are two issues. 1) in pythonrun.c in E_DECODE there is a missing break 2) the check for # -*- coding is not strict enough The patch makes the check r'# (-\*-)? coding[:=]' The attached patch addresses both issues, although I'm not sure you will agree #2 is a problem. Feel free to checkin, assign back to me or whatever. I'm not sure what the error message in pythonrun should be, right now it's "unknown decode error." Perhaps that should be "invalid encoding" or something?
Logged In: YES user_id=21627 The patch is wrong. The PEP deliberately allows for arbitrary occurrences of the substring "coding", in particular inside "encoding". This was made so that other editors, like vi or LEO, can continue to use their own encoding declarations, and Python would recognize them. Unfortunately, LEO decided to add a full stop at the end of the line, so Python looks for an encoding named "cp1251.". We agree with the LEO author that this is a problem in LEO, and will be fixed. Alternatively, we could amend the PEP and declare that trailing dots are not part of the encoding name. The other part of the patch is correct; I have applied it as pythonrun.c 2.195.6.6 and 2.207. It would be even better if we could display the actual cause of the problem, but that is currently not supported in the parser.