[Python-Dev] eval and triple quoted strings (original) (raw)

Guido van Rossum guido at python.org
Tue Jun 18 02:06:03 CEST 2013


On Mon, Jun 17, 2013 at 5:02 PM, Benjamin Peterson <benjamin at python.org> wrote:

2013/6/17 Guido van Rossum <guido at python.org>:

On Mon, Jun 17, 2013 at 4:40 PM, Benjamin Peterson <benjamin at python.org> wrote:

2013/6/17 Greg Ewing <greg.ewing at canterbury.ac.nz>:

Guido van Rossum wrote:

No. Executing a file containing those exact characters produces a string containing only '\n' and exec/eval is meant to behave the same way. The string may not have originated from a file, so the universal newlines behavior of the io module is irrelevant here -- the parser must implement its own equivalent processing, and it does.

I'm still not convinced that this is necessary or desirable behaviour. I can understand the parser doing this as a workaround before we had universal newlines, but now that we do, I'd expect any Python string to already have newlines converted to their canonical representation, and that any CRs it contains are meant to be there. The parser shouldn't need to do newline translation a second time. It used to be that way until 2.7. People like to do things like with open("myfile.py", "rb") as fp: exec fp.read() in ns which used to fail with CRLF newlines because binary mode doesn't have them. I think this is actually the correct way to execute Python sources because the parser then handles the somewhat complicated process of decoding Python source for you. What exactly does the parser handles better than the io module? Is it just the coding cookies? I suppose that works as long as the file is encoded using as ASCII superset like the Latin-N variants or UTF-8. It would fail pretty badly if it was UTF-16 (and yes, that's an abominable encoding for other reasons :-). The coding cookie is the main one. In fact, if you can't parse that, you don't really know what encoding to open the file with at all. There's also small things like BOM handling (you have to use the utf-16-sig encoding with TextIO to get it removed) and defaulting to UTF-8 (which the io module doesn't do) which is better left to the parser.

Maybe there are some lessons here that the TextIO module could learn?

-- --Guido van Rossum (python.org/~guido)



More information about the Python-Dev mailing list