msg55758 - (view) |
Author: Viktor Ferenczi (complex) |
Date: 2007-09-09 00:35 |
Read the WARNING below, then run the attached script with Python3.0a2. It will eat all of your memory. WARNING: Keep a process killing tool or an extra command line at your fingertips, since this script could render your machine unusable in about 10-20 seconds depending on your memory and CPU speed!!! YOU ARE WARNED! OS: Ubuntu Feisty, up-to-date Python: Python3.0a1, built from sources, configured with: --prefix=/usr/local |
|
|
msg55759 - (view) |
Author: Viktor Ferenczi (complex) |
Date: 2007-09-09 00:45 |
Confirmed on Windows: OS: Windows XP SP2 ENG Python: Python3.0a1 MSI installer, default installation |
|
|
msg55760 - (view) |
Author: Viktor Ferenczi (complex) |
Date: 2007-09-09 00:50 |
Works fine (does nothing) with Python 2.4.4 and Python 2.5.1 under Windows, so this bug must be caused by some new code in Python3.0a1. The bug depends on the contents of the doc string. There's another strange behavior if you write the word "this" in the docstring somewhere. The docstring could be parsed as source code somehow and causes strange things to the new parser. |
|
|
msg55761 - (view) |
Author: Viktor Ferenczi (complex) |
Date: 2007-09-09 00:57 |
Errata: In the first line of my original post I mean Python3.0a1 and not 3.0a2, certainly. |
|
|
msg55768 - (view) |
Author: Alan McIntyre (alanmcintyre) *  |
Date: 2007-09-09 21:56 |
Confirmed that this happens on Mac OS X with a fresh build of py3k from svn. |
|
|
msg55773 - (view) |
Author: Stefan Sonnenberg-Carstens (pythonmeister) |
Date: 2007-09-10 03:05 |
Same under Linux with Python 3.0a1. Eats all cpu + memory |
|
|
msg55932 - (view) |
Author: Alexey Suda-Chen (alexeychen) |
Date: 2007-09-15 22:22 |
--- tokenizer.c (revision 58161) +++ tokenizer.c (working copy) @@ -402,6 +402,8 @@ if (allocated) { Py_DECREF(bufobj); } + Py_XDECREF(tok->decoding_buffer); + tok->decoding_buffer = 0; return s; |
|
|
msg55934 - (view) |
Author: Brett Cannon (brett.cannon) *  |
Date: 2007-09-16 00:17 |
Note the patch is inlined in a message. |
|
|
msg55943 - (view) |
Author: Alexey Suda-Chen (alexeychen) |
Date: 2007-09-16 15:11 |
Oops, i see there are two bugs. Previously i have fixed multiline strings only. I think it will be: Index: tokenizer.c =================================================================== --- tokenizer.c (revision 58161) +++ tokenizer.c (working copy) @@ -395,6 +395,7 @@ goto error; buflen = size; } + memcpy(s, buf, buflen); s[buflen] = '\0'; if (buflen == 0) /* EOF */ @@ -402,6 +403,12 @@ if (allocated) { Py_DECREF(bufobj); } + + if ( bufobj == tok->decoding_buffer ){ + Py_XDECREF(tok->decoding_buffer); + tok->decoding_buffer = 0; + } + return s; error: |
|
|
msg55995 - (view) |
Author: Sean Reifschneider (jafo) *  |
Date: 2007-09-18 13:11 |
Confirmed problem (used 4.5GB before I killed it), and that the second patch resolved the problem. I'm uploading the inline patch as an attachment, with the directory name in it as well (from svn diff). Bumping the priority to high because the side effect can cause all sorts of problems on a system including other processes being killed. |
|
|
msg56083 - (view) |
Author: Neil Schemenauer (nas) |
Date: 2007-09-21 21:32 |
It looks to me like fp_readl is no longer working as designed and the patch is not really the right fix. The use of "decoding_buffer" is tricky and I think the conversion to bytes screwed it up. It might be clearer to have a separate "decoding_overflow" struct member that's used for overflow rather than overloading "decoding_buffer". |
|
|
msg57475 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2007-11-14 00:40 |
The issue isn't fixed yet. The script is still eating precious memory. |
|
|
msg57477 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-11-14 00:54 |
Amaury, can you have a look at this? I think it's a bug in tok_nextc() in tokenizer.c. |
|
|
msg57478 - (view) |
Author: Viktor Ferenczi (complex) |
Date: 2007-11-14 02:36 |
This bug prevents me and many others to do preliminary testing on Py3k, which slows down it's development. This bug is _really_ hurts. I've a completely developed new module for Py3k that cannot be released due to this bug, since it's unit tests are affected by this bug and would crash the user's machine. Sadly I've not enough free time and readily available in-depth knowledge to fix this, especially after the first attempt was not perfect, which shows that it may be a bug that cannot be fixed by correcting a typo somewhere... :-) |
|
|
msg57480 - (view) |
Author: Christian Heimes (christian.heimes) *  |
Date: 2007-11-14 03:14 |
I've already raised the priority to draw more attention to this bug. So far I'm not able to solve the bug but I've nailed down the issue to a short test case: HANGS: # -*- coding: ascii -*- """ """ The problem manifests itself only in the combination of the ascii encoding and triple quotes across two or more line. Neither a different encoding nor a string across a single line has the same problem WORKS: # -*- coding: ascii -*- """ """ WORKS: # -*- coding: latin.1 -*- """ """ WORKS: # -*- coding: ascii -*- """ """ DOESN'T COMPILE: # -*- coding: ascii -*- "\ " File "hungry_script2.py", line 5 SyntaxError: EOL while scanning single-quoted string The latest example does compile with Python 2.5. Please note also the wrong line number. The file has only three (!) lines. During my debugging session I saw an infinite loop in tokenzize.c:1429 letter_quote: /* String */ if (c == '\'' | |
c == '"') { ... for (;;) { INFINITE LOOP } |
|
msg57483 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-11-14 05:04 |
Is this also broken in the 3.0a1 release? If not, it might be useful to try to find the most recent rev where it's not broken. |
|
|
msg57484 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-11-14 05:05 |
Is this also broken in the 3.0a1 release? If not, it might be useful to try to find the most recent rev where it's not broken. |
|
|
msg57486 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2007-11-14 10:27 |
fp_readl is indeed broken in several ways: - decoding_buffer should be reset to NULL when all data has been read (buflen <= size). - the (buflen > size) case will cause an error on the next pass, since the function cannot handle PyBytesObject. IOW, the function is always wrong ;-) I have a correction ready (jafo's patch already addresses the first case), but cannot access svn here. I will try to provide a patch + test cases later tonight. |
|
|
msg57487 - (view) |
Author: Viktor Ferenczi (complex) |
Date: 2007-11-14 12:40 |
In response to Guido: According to pythonmeister's post (2007-09-10): "Same under Linux with Python 3.0a1. Eats all cpu + memory" I found the bug with this version: fviktor@rigel:~$ python3.0 --version Python 3.0a1 AFAIK it is the latest alpha released. I did not try the SVN trunk, but may be buggy with high probability, since this issue has not been closed yet. Viktor (complex) |
|
|
msg57574 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2007-11-15 23:21 |
Corrected in revision 59001, with a modified patch. |
|
|