msg75627 - (view) |
Author: Takafumi SHIDO (shidot) |
Date: 2008-11-08 02:49 |
The profile module of Python3 deesn't understand the character set of the script. When a profile is executed (like $python -m profile -o prof.dat foo.py) on a code (say foo.py) which defines its character set in the second line (like #coding:utf-8), the profile crashes with an error message like: "SyntaxError: unknown encoding: utf-8" |
|
|
msg75676 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 00:40 |
exec() doesn't work if the argument is an unicode string. Here is a workaround for the profile module (open the file in binary mode), but it doesn't fix the exec() problem. |
|
|
msg75677 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2008-11-10 01:03 |
Exemple of the problem: exec('#header\n# encoding: ISO-8859-1\nprint("h\xe9 h\xe9")\n') exec(unicode) calls source_as_string() which converts unicode to bytes using _PyUnicode_AsDefaultEncodedString() (UTF-8 charset). Then PyRun_StringFlags() is called with the UTF-8 byte string with PyCF_SOURCE_IS_UTF8 flag. But in the parser, get_coding_spec() recognize the "#coding:" header and convert bytes to unicode using the specified charset (which may be different than UTF-8). The problem is in the function PyAST_FromNode(): the flag in not used in the tokenizer but only in the AST parser. I also see: if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) { c.c_encoding = "utf-8"; if (TYPE(n) == encoding_decl) { #if 0 ast_error(n, "encoding declaration in Unicode string"); goto error; #endif n = CHILD(n, 0); } } else if (TYPE(n) == encoding_decl) { c.c_encoding = STR(n); n = CHILD(n, 0); } else { /* PEP 3120 */ c.c_encoding = "utf-8"; } The ast_error() may be uncommented. |
|
|
msg83842 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-20 01:25 |
This bug was a duplicate of #4626 which was fixed by r70113 ;-) |
|
|
msg83843 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-20 01:30 |
Oops, i misread this issue (wrong title!). #4626 is related, but this issue is about the profile module. The problem is that profile open the source code as text (with the default charset: UTF-8). Attached patch fixes the problem. Example: --- x.py (ISO-8859-1 text file) --- #coding: ISO-8859-1 print("hé hé") ----------------------------------- Run: python -m profile x.py Current result: (...) File ".../py3k/Lib/profile.py", line 614, in main script = fp.read() File ".../Lib/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes (...) With my patch, it works as expected. |
|
|
msg83844 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-20 01:44 |
Oops, benjamin noticed that it doesn't work with Windows end of line (\r\n). New patch reads the file encoding instead of reading file content as bytes. |
|
|
msg83846 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-20 01:56 |
This regression was introduced by the removal of execfile() in Python3. The proposed replacement of execfile() is wrong. I propose a generic fix in the issue #5524. |
|
|
msg83933 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-21 10:51 |
After some discussions, I think that my first patch (profile_encoding.patch) was correct but we also have to fix #4628. |
|
|
msg101477 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-03-22 02:00 |
Fixed by r79271 (py3k), r79272 (3.1). |
|
|