msg318617 - (view) |
Author: Isaac Elliott (Isaac Elliott) |
Date: 2018-06-04 04:05 |
echo 'print("a");print("b")' > test.py This program is grammatically incorrect according to the specification (https://docs.python.org/3.8/reference/grammar.html). But Python 3 runs it without issue. It's this production here simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE which says 'simple_stmt's must be terminated by a newline. However, the program I wrote doesn't contain any newlines. I think the grammar spec is missing some information, but I'm not quite sure what. Does anyone have an idea? |
|
|
msg318620 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-04 04:31 |
NEWLINE is not a newline. It is the NEWLINE token. And it is generated at the end of file. $ echo 'print("a");print("b")' | ./python -m tokenize 1,0-1,5: NAME 'print' 1,5-1,6: OP '(' 1,6-1,9: STRING '"a"' 1,9-1,10: OP ')' 1,10-1,11: OP ';' 1,11-1,16: NAME 'print' 1,16-1,17: OP '(' 1,17-1,20: STRING '"b"' 1,20-1,21: OP ')' 1,21-1,22: NEWLINE '\n' 2,0-2,0: ENDMARKER '' |
|
|
msg318622 - (view) |
Author: Isaac Elliott (Isaac Elliott) |
Date: 2018-06-04 04:37 |
Thanks for the clarification. Is there a reference to this in the documentation? |
|
|
msg318624 - (view) |
Author: Ammar Askar (ammar2) *  |
Date: 2018-06-04 04:42 |
https://docs.python.org/3.8/reference/lexical_analysis.html |
|
|
msg318625 - (view) |
Author: Isaac Elliott (Isaac Elliott) |
Date: 2018-06-04 04:47 |
I went through that document before I created this issue. I can't find anything which describes this behavior - could you be more specific please? |
|
|
msg318627 - (view) |
Author: Ammar Askar (ammar2) *  |
Date: 2018-06-04 04:51 |
Actually, echo implicitly puts a newline at the end. If you run with echo -n, this is the output: $ echo -n 'print("a");print("b")' | python3 -m tokenize 1,0-1,5: NAME 'print' 1,5-1,6: OP '(' 1,6-1,9: STRING '"a"' 1,9-1,10: OP ')' 1,10-1,11: OP ';' 1,11-1,16: NAME 'print' 1,16-1,17: OP '(' 1,17-1,20: STRING '"b"' 1,20-1,21: OP ')' 2,0-2,0: ENDMARKER '' No newline token present. |
|
|
msg318629 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-06-04 04:55 |
Good point Ammar. Seems there is also a missing corner case in the definition of a physical line: https://docs.python.org/3.8/reference/lexical_analysis.html#physical-lines """ A physical line is a sequence of characters terminated by an end-of-line sequence. In source files, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform. """ It misses a case when a physical line is terminated by the end of file. |
|
|
msg318631 - (view) |
Author: Ammar Askar (ammar2) *  |
Date: 2018-06-04 05:01 |
Relevant bit of the parser that emits a fake newline at the end of the file if not present: https://github.com/python/cpython/blob/master/Parser/tokenizer.c#L1059-L1069 |
|
|
msg318633 - (view) |
Author: Isaac Elliott (Isaac Elliott) |
Date: 2018-06-04 05:29 |
Cool, thanks for the help. Should I submit a PR with the updated documentation? |
|
|
msg318634 - (view) |
Author: Ammar Askar (ammar2) *  |
Date: 2018-06-04 05:37 |
Sorry, I was already working on the patch by the time you posted the comment. If we see above, it seems like the tokenize module doesn't correctly mirror the behavior of the C tokenizer. Do you want to try fixing that as a bug? That would involve making a new bpo ticket and submitting a PR there. |
|
|
msg318668 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2018-06-04 16:18 |
I am fine with adding this to the docs. But the irony of the case is that the echo command adds a newline, so the original premise (that test.py contains an invalid program) is incorrect. ;-) |
|
|
msg319091 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-06-08 18:39 |
A few years ago, there was a particular case in which compile failed without a trailing newline. We fixed it so that it would work anyway. Unless we are willing for a conforming Python interpreter to fail >>> exec('print("hello")') hello The Reference Manual should be clear that EOF and EOS (end-of-string) is treated as NEWLINE. |
|
|
msg319186 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2018-06-09 23:49 |
New changeset 0aa17ee6a76df0946d42e7657a501f1862065a22 by Terry Jan Reedy (Ammar Askar) in branch 'master': bpo-33766: Document that end of file or string is a newline (GH-7383) https://github.com/python/cpython/commit/0aa17ee6a76df0946d42e7657a501f1862065a22 |
|
|
msg319187 - (view) |
Author: miss-islington (miss-islington) |
Date: 2018-06-09 23:55 |
New changeset f01b951a0e70f36ca2a3caa043f89a5277bb0bb0 by Miss Islington (bot) in branch '2.7': bpo-33766: Document that end of file or string is a newline (GH-7383) https://github.com/python/cpython/commit/f01b951a0e70f36ca2a3caa043f89a5277bb0bb0 |
|
|