[Python-Dev] Reading Python source file (original) (raw)

Serhiy Storchaka storchaka at gmail.com
Tue Nov 17 11:06:17 EST 2015

Previous message (by thread): [Python-Dev] Reading Python source file
Next message (by thread): [Python-Dev] Reading Python source file
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 17.11.15 05:05, MRAB wrote:

As I understand it, *nix expects the shebang to be b'#!', which means that the first line should be ASCII-compatible (it's possible that the UTF-8 BOM might be present). This kind of suggests that encodings like UTF-16 would cause a problem on such systems.

The encoding line also needs to be ASCII-compatible. I believe that the recent thread "Support of UTF-16 and UTF-32 source encodings" also concluded that UTF-16 and UTF-32 shouldn't be supported. This means that you could treat the first 2 lines as though they were some kind of extended ASCII (Latin-1?), the line ending being '\n' or '\r' or '\r\n'. Once you'd identify the encoding, you could decode everything (including the shebang line) using that encoding.

Yes, that is what I were going to implement (and already halfway here). My question is whether it is worth to complicate the code further to preserve reading by the line. In any case after reading the first line that doesn't contain neither coding cookie, nor non-comment tokens, we need to wait the second line.

(What should happen if the encoding line then decoded differently, i.e. encodingline.decode(encoding) != encodingline.decode('latin-1')?)

The parser should got the line decoded with specified encoding.

Previous message (by thread): [Python-Dev] Reading Python source file
Next message (by thread): [Python-Dev] Reading Python source file
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list