Issue 11343: Make errors due to full parser stack identifiable (original) (raw)
Created on 2011-02-26 22:39 by Trundle, last changed 2022-04-11 14:57 by admin.
Messages (29)
Author: Andreas Stührk (Trundle) *
Date: 2011-02-26 22:39
Currently, if the parser's internal stack is full (as it happens when
the parser tries to parse a deeply nested structure), the parser
writes an error message to stderr and a bare MemoryError is
raised. That way, it is really hard for REPLs do handle input that
causes the parser to exhaust its stack correctly (see e.g. python -c "print '(' * 100 + ')' * 100" | python -c "import code; code.interact()"
).
In my opinion, a SyntaxError would be great because that way one can
easily see where in the source code the error is caused. One reason
against that (stated by MvL in ) is that it's really
syntactically correct Python and only the parser can't parse it. But
then, the same is also true when there are too many indentation levels
in which case an IndentationError
is raised (which is a subclass of
SyntaxError
).
I have attached two patches: The first still raises a MemoryError
(for the sake of backward compatibility) but adds a message ("too many
opening parens") to it. The second raises a SyntaxError
instead.
Author: Terry J. Reedy (terry.reedy) *
Date: 2011-03-04 21:30
I agree that compile-time stack exhaustion is different from runtime object-heap exhaustion and could/should have a different error.
I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof. I believe IndentationError was added since 2000. Its doc is "Base class for syntax errors related to incorrect indentation." and it has TabError as a subclass, so its use for too many indents (not really 'incorrect') is not obvious. But having the position marked (if it would be) would be a plus.
I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)?
Author: Andreas Stührk (Trundle) *
Date: 2011-03-25 01:05
On Fri, Mar 4, 2011 at 9:30 PM, Terry J. Reedy <report@bugs.python.org> wrote:
I agree with Martin (from 2000) that SyntaxError is not right either. Perhaps a new ParseError subclass thereof.
I added a new ParserError
that inherits from SyntaxError
.
I presume REPL == read-eval-print-loop (from Google). Would a new error help such programs (like code.interact, or IDLE)?
Yes, I meant a read-eval-print-loop. A new error would help and if it's a subclass of SyntaxError, they most likely need no changes at all to handle that new error, as they have to deal with SyntaxErrors anyway.
Author: Andreas Stührk (Trundle) *
Date: 2011-04-29 16:04
FWIW, this also affects ast.literal_eval()
.
Author: Antoine Pitrou (pitrou) *
Date: 2011-07-15 18:53
This looks like a rather good idea, although I'm not sure it deserves a new global exception type.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2011-07-15 19:47
Just make it a SyntaxError.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2011-07-15 19:48
Oh, I see Martin disagrees. SyntaxError is raised for anything which Python won't compile. For example, too many arguments gets a SyntaxError.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2011-07-16 07:56
+1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly"
Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError".
Possible docs phrasing:
exception:: SyntaxLimitError
Raised when the compilation and code generation process encounters an error due to internal limitations of the current implementation (e.g. the CPython parser has an upper limit on the number of nested sets of parentheses it can handle in a single expression, but no such limitation exists in the Python language reference).
This is a subclass of :exc:SyntaxError
.
Author: Georg Brandl (georg.brandl) *
Date: 2011-07-16 08:24
I like SyntaxLimitError much better than ParserError.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2011-07-16 14:43
2011/7/16 Nick Coghlan <report@bugs.python.org>:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
+1 for a new exception type to indicate "this may technically be legal Python, but this Python implementation cannot handle it correctly"
Whatever exception type we add, it would be nice to also be able to use it if someone eventually fixes the compiler recursion crasher, so I'd like to paint this particular bikeshed as the more general "SyntaxLimitError".
What is the point of a new exception? When would you ever want to catch it as opposed to a regular SyntaxError? You're going to have to change the code either way.
Moreover, all Python implementations are going to have to place some limits on stack depth and recursion, so it's not really an implementation detail.
The Python language doesn't make an mention of limited memory, but no one is going to suggest a MemoryLimitError, which is an "implementation detail".
Author: Alyssa Coghlan (ncoghlan) *
Date: 2011-07-16 15:10
It's important to remember that other implementations treat CPython as the "gold standard" for compatibility purposes. If we declare something to be an ordinary SyntaxError, then that carries strong implications for what other implementations should do.
Some kinds of errors are inherently implementation specific (MemoryError and SystemError spring to mind). No sane implementor is going to try to match CPython like-for-like when it comes to those. SyntaxError, however, is typically defined by the language definition, not the CPython implementation of it. By introducing a new exception type, we're explicitly telling other implementations "Look, this is our problem that we don't plan to fix as it doesn't typically arise in real code, but please don't deliberately cripple your own implementation just to match this behaviour". I'm strongly with MvL on this one - SyntaxError itself should never be raised for legal Python code just because the CPython bytecode generation toolchain isn't able to handle it properly.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2011-07-16 15:16
2011/7/16 Nick Coghlan <report@bugs.python.org>:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
It's important to remember that other implementations treat CPython as the "gold standard" for compatibility purposes. If we declare something to be an ordinary SyntaxError, then that carries strong implications for what other implementations should do.
Some kinds of errors are inherently implementation specific (MemoryError and SystemError spring to mind). No sane implementor is going to try to match CPython like-for-like when it comes to those. SyntaxError, however, is typically defined by the language definition, not the CPython implementation of it. By introducing a new exception type, we're explicitly telling other implementations "Look, this is our problem that we don't plan to fix as it doesn't typically arise in real code, but please don't deliberately cripple your own implementation just to match this behaviour". I'm strongly with MvL on this one - SyntaxError itself should never be raised for legal Python code just because the CPython bytecode generation toolchain isn't able to handle it properly.
Implementations are quite good at figuring this stuff out themselves. We don't need a whole new exception (new docs, one more thing in the builtin namespace) to "send a message" to other implementations. If some implementation is not part of the language, then its test just just be marked cpython_only. It shouldn't be brought into public API and the language.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2011-07-16 15:22
It also makes it clear to users whether they've just run up against a limitation of the implementation they're using or whether what they've written is genuinely illegal code. They are NOT the same thing. Attempting to conflate them into one exception for the mere sake of not adding a new exception type is a false economy. Take it to python-dev if you want, but I will strongly oppose this check going in with it raising an ordinary SyntaxError instead of a new subclass.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2011-07-16 15:33
2011/7/16 Nick Coghlan <report@bugs.python.org>:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
It also makes it clear to users whether they've just run up against a limitation of the implementation they're using or whether what they've written is genuinely illegal code. They are NOT the same thing. Attempting to conflate them into one exception for the mere sake of not adding a new exception type is a false economy. Take it to python-dev if you want, but I will strongly oppose this check going in with it raising an ordinary SyntaxError instead of a new subclass.
Why would it matter to users what variety of limitation they're running up against? As I said, they're still going to have to change their code.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2011-07-16 15:53
It matters, because Python "users" are programmers, and most programmers want to know why they're being told something is wrong. Raising MemoryError is technically incorrect at this point, but at least gives the right flavour of the user not doing anything specifically wrong, the program just ran out of resources trying to do as they asked. I see SyntaxError as significantly worse, however, as instead of telling the developer "sorry, that's a reasonable request, but I can't handle it", the interpreter is now telling them "you did that wrong, change it". The semantics of that are all wrong. There's nothing wrong with the syntax of the user's code, it's just that the compiler can't handle it.
Similarly, the eventual fix to the recursive crash in the compiler itself is likely to be an arbitrary limit on expression nesting. Again, nothing wrong with their syntax, but they'll need to adjust their (legal!) code to work around implementation limitations.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2011-07-16 16:50
2011/7/16 Nick Coghlan <report@bugs.python.org>:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
It matters, because Python "users" are programmers, and most programmers want to know why they're being told something is wrong. Raising MemoryError is technically incorrect at this point, but at least gives the right flavour of the user not doing anything specifically wrong, the program just ran out of resources trying to do as they asked. I see SyntaxError as significantly worse, however, as instead of telling the developer "sorry, that's a reasonable request, but I can't handle it", the interpreter is now telling them "you did that wrong, change it". The semantics of that are all wrong. There's nothing wrong with the syntax of the user's code, it's just that the compiler can't handle it.
I content that in normal code, it is so extremely rare as to be unheard of, to get exceptions about the parser stack overflowing or segfault the compiler by too deep nesting. People who are doing this (generally to prove the point about limitations of the compiler) are smart enough to know exactly what "parser stack overflowed" means and separate that from Python as the language. Adding a new exception is solving a non-existent problem.
Author: Antoine Pitrou (pitrou) *
Date: 2011-07-16 22:28
I content that in normal code, it is so extremely rare as to be unheard of, to get exceptions about the parser stack overflowing or segfault the compiler by too deep nesting. People who are doing this (generally to prove the point about limitations of the compiler) are smart enough to know exactly what "parser stack overflowed" means and separate that from Python as the language. Adding a new exception is solving a non-existent problem.
Agreed with Benjamin. Also, "why something is wrong" can simply be told in the exception message.
Author: R. David Murray (r.david.murray) *
Date: 2011-07-16 23:13
This is a purity versus practicality argument. I agree with both sides :)
Author: Alyssa Coghlan (ncoghlan) *
Date: 2011-07-17 06:32
After further reflection, I (eventually) came to the same conclusion as Benjamin and Antoine - using SyntaxError for this is fine.
However, the exception message should make it clear that this is an implementation limit rather than a language limit.
Author: Senthil Kumaran (orsenthil) *
Date: 2011-07-18 23:28
Oh, I thought we never rely on exception "message" for anything important. However this seems to be an exception for that exception. :-)
Author: Antoine Pitrou (pitrou) *
Date: 2011-07-18 23:34
Oh, I thought we never rely on exception "message" for anything important. However this seems to be an exception for that exception. :-)
I think you're missing the point. People usually don't catch SyntaxError and run fallback code in this case. And even if they do, they are unlikely to care about whether the ErreurDeSyntaxe is due to a genuinely faulty piece of code, or a parser limitation ;)
Author: Alyssa Coghlan (ncoghlan) *
Date: 2011-07-19 02:11
Yeah, at the level of code the origin doesn't really matter, so re-using the exception type is actually OK. However, for people seeing the stack trace, it may be useful to know what's genuinely illegal syntax and what's a limitation of the current implementation. The exception message is a good place for that additional detail.
Author: Irit Katriel (iritkatriel) *
Date: 2021-07-02 23:10
The patch relates to the old parser.
With the new parser the 100*(+100*) example works. If we go to 1000 instead of 100 we get "SyntaxError: too many nested parentheses".
From the discussion it seems that the idea of a new exception type for implementation-limits vs "real" SyntaxErrors was rejected as not useful.
Is there anything left to do on this issue?
Author: Terry J. Reedy (terry.reedy) *
Date: 2021-07-08 03:41
I retract my original comment as I now agree with SyntaxError + distinct message.
Given Irit's report, I think the remaining question is whether
A. The implication that 'too many nested parentheses' is compiler specific is strong enough, and any thing further should be left to 3rd parties to explain, and this issue should be closed as 'out of date'.
B. We should be explicit and add something like 'for this compiler'.
Since Pablo has been working on improving error messages, I would give his opinion high weight.
To find the actual limit, I did the following with Win 10, 3.10.0b3, IDLE:
def f(n): ... print(eval('('*n + ')'*n")) ... And after some trials, found f(200) () f(201) Traceback (most recent call last): File "<pyshell#56>", line 1, in f(201) File "<pyshell#42>", line 2, in f print(eval('('*n + ')'*n)) File "", line 1 ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((())))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))) ^ SyntaxError: too many nested parentheses
The error is detected in 'string' line 1, 1-based offset 201, at the final '('. There are 200 spaces before the caret but they are not obvious in this box, which does not wrap spaces properly. The end_lineno and end_offset are None, None rather than the 1,202 I expected.
Author: Andre Roberge (aroberge) *
Date: 2021-07-08 09:45
For information: I created an actual .py file from the last example with 200 parens and got the following error message:
python example.py s_push: parser stack overflow MemoryError
For consistency and greater clarity, I think that having messages such as
- SyntaxError: parser stack overflow (too many nested parenthesis)
- SyntaxError: parser stack overflow (too many statistically nested blocks)
and possibly others (too many arguments?), would be preferable, with the possibility of simply having
- SyntaxError: parser stack overflow
if the exact cause cannot be identified at the parsing stage.
Author: Pablo Galindo Salgado (pablogsal) *
Date: 2021-07-08 16:38
python example.py s_push: parser stack overflow MemoryError
That error can only happen with the old parser, so you must be using Python3.8 or 3.9 with the old parser. Can you confirm?
Author: Andre Roberge (aroberge) *
Date: 2021-07-08 16:43
@pablo: Sincere apologies ... I tested this with the wrong virtual environment active locally. With 3.10, it is indeed SyntaxError: too many nested parentheses
Author: Pablo Galindo Salgado (pablogsal) *
Date: 2021-07-08 16:43
As a piece of extra information, this limit has been there for a while:
the problem is that the old parser died before reaching it in many situations.
Author: Pablo Galindo Salgado (pablogsal) *
Date: 2021-07-08 16:46
Regarding what to do, what I would love to optimize is that the message is clear. One of the challenges people may have with too specific messages is that they may not know what "a compiler" or "a parser" means (the first may be especially confusing since many people think Python is not "compiled".
On the other hand, is more informative to say "too many nested parentheses" than "parser stack overflow" since many people don't know what that means, especially in Python.
I
History
Date
User
Action
Args
2022-04-11 14:57:13
admin
set
github: 55552
2021-07-08 16:46:57
pablogsal
set
messages: +
2021-07-08 16:43:38
pablogsal
set
messages: +
2021-07-08 16:43:04
aroberge
set
messages: +
2021-07-08 16:38:49
pablogsal
set
messages: +
2021-07-08 09:45:22
aroberge
set
messages: +
2021-07-08 03:41:02
terry.reedy
set
versions: + Python 3.11, - Python 3.4
nosy: + aroberge, pablogsal
messages: +
stage: patch review -> needs patch
2021-07-02 23:10:59
iritkatriel
set
nosy: + iritkatriel
messages: +
2013-06-30 05:58:25
terry.reedy
set
versions: + Python 3.4, - Python 3.3
2011-07-19 02:11:21
ncoghlan
set
messages: +
2011-07-18 23:34:38
pitrou
set
messages: +
2011-07-18 23:28:02
orsenthil
set
nosy: + orsenthil
messages: +
2011-07-17 06:32:24
ncoghlan
set
messages: +
2011-07-16 23:13:20
r.david.murray
set
nosy: + r.david.murray
messages: +
2011-07-16 22:28:55
pitrou
set
messages: +
2011-07-16 16:50:34
benjamin.peterson
set
messages: +
2011-07-16 15:53:44
ncoghlan
set
messages: +
2011-07-16 15:52:27
Arfrever
set
nosy: + Arfrever
2011-07-16 15:33:25
benjamin.peterson
set
messages: +
2011-07-16 15:22:45
ncoghlan
set
messages: +
2011-07-16 15:16:04
benjamin.peterson
set
messages: +
2011-07-16 15:10:41
ncoghlan
set
messages: +
2011-07-16 14:43:34
benjamin.peterson
set
messages: +
2011-07-16 08:24:36
georg.brandl
set
nosy: + georg.brandl
messages: +
2011-07-16 07:56:41
ncoghlan
set
messages: +
2011-07-15 20:38:39
eric.snow
set
nosy: + eric.snow
2011-07-15 19:48:27
benjamin.peterson
set
messages: +
2011-07-15 19:47:05
benjamin.peterson
set
messages: +
2011-07-15 18:53:11
pitrou
set
nosy: + pitrou, benjamin.peterson, ncoghlan
messages: +
stage: patch review
2011-04-29 16:04:17
Trundle
set
messages: +
2011-03-25 01:07:09
Trundle
set
files: - parser_nested_MemoryError.patch
2011-03-25 01:07:03
Trundle
set
files: - parser_nested_SyntaxError.patch
2011-03-25 01:05:27
Trundle
set
files: + issue11343.patch
messages: +
2011-03-04 21:30:10
terry.reedy
set
nosy: + loewis, terry.reedy
messages: +
2011-02-26 22:40:29
Trundle
set
files: + parser_nested_SyntaxError.patch
nosy:marienz, Trundle
2011-02-26 22:39:50
Trundle
create