msg71697 - (view) |
Author: Mark Hammond (mhammond) *  |
Date: 2008-08-21 23:21 |
[from python-dev] I've found a recursion related crash in test_cpickle on 64bit builds. Specifically, the 'cPickleDeepRecursive' is causing a stack overflow, and the debugger tells me the actual recursion depth was 629 at the crash. The reason the 64bit build doesn't see more crashes is apparently due to another regression in 2.6 - http://bugs.python.org/issue3373. It appears that in some cases, the recursion counter is actually incremented *twice* for each entry, thereby causing the "maximum recursion depth exceeded" exception to appear at a true recursion limit of 500. However, test_cpickle takes a different path and doesn't see this doubling of the count - therefore dieing at the depth of 629 that I can see. My solution to this was to simply double the stack size for the executables in 64bit builds, from 2MB to 4MB (2.1 and 4.2 for debug builds.) Is this an appropriate fix? |
|
|
msg71732 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-22 09:18 |
In my opinion this is a bug in cPickle. It is completely wrong to consume more than 3KB of stack space for each recursion level, and it should be fixed. |
|
|
msg71758 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-08-22 16:56 |
The problem comes from this local variable in batch_list() and batch_dict: PyObject *slice[BATCHSIZE]; and BATCHSIZE=1000... |
|
|
msg71760 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-08-22 17:39 |
Attached patch removes the big array, items are saved when they are fetched from the container. I preserved the the special case when there is only one item. The patch does not merge cleanly into py3k, but the functions batch_list and batch_dict look very similar and suffer the same problem. |
|
|
msg71764 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-22 18:13 |
Here is an additional patch which adds a cpickle test to Misc/find_recursion_limit.py. It helps show that Amaury's patch does fix the issue. +1 for applying it. |
|
|
msg71799 - (view) |
Author: Mark Hammond (mhammond) *  |
Date: 2008-08-23 01:26 |
Sorry for the initial noise - your analysis is correct, mine was flawed :) Simple recursion to a depth of 1000 does work fine on a 64bit build. cpickle.patch does make test_cpickle pass for me. FWIW, find_recursionlimit.py now causes a segfault on 32 and 64bit Windows (the comments imply a MemoryError is expected there), and the 32bit version dies at a depth of 5900, while the 64bit version dies at 3800 (both doing an 'add') - so it does seem there is still a discrepancy - but not one we need to care about now. I'm +1 in general on this, but I'm not sure that I'm +1 on this patch for 1.6 without more careful review than I am able to offer at the moment; it seems fairly unlikely to be hit "in the wild", but I'll obviously defer that to others. OTOH, having the test suite segfault isn't a good look, so we probably need to do *something* for 1.6 |
|
|
msg71804 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-08-23 09:18 |
Le samedi 23 août 2008 à 01:26 +0000, Mark Hammond a écrit : > cpickle.patch does make test_cpickle pass for me. FWIW, > find_recursionlimit.py now causes a segfault on 32 and 64bit Windows > (the comments imply a MemoryError is expected there), and the 32bit > version dies at a depth of 5900, while the 64bit version dies at 3800 > (both doing an 'add') - so it does seem there is still a discrepancy - > but not one we need to care about now. Well, the discrepancy is expected, since pointers are bigger in 64-bit mode than in 32-bit mode, and most function calls inside the interpreter involve pushing a bunch of PyObject pointers on the stack. |
|
|
msg71805 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-08-23 12:08 |
Adapted the patch to python 3.0 |
|
|
msg72990 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-09-10 22:20 |
Can someone have a look at this patch? It is not 64bit nor Windows specific; its goal is to reduce the size of local variables in batch_list(), which is easy to verify. The point is to check the correctness of the code, specially around errors checks and reference counting. |
|
|
msg73008 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-09-11 07:19 |
The patch is fine, please apply. My only question is why the 3.0 patch integrates the find_recursionlimit change, whereas the 2.6 patch does not. |
|
|
msg73014 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-09-11 11:05 |
- cpickle.patch and find_recursion_limit.patch are for 2.6 - cpickle.patch-3.0.patch groups the two previous patches, adapted to 3.0. |
|
|
msg73015 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-09-11 12:10 |
Ah, ok. It's all fine, then. |
|
|
msg73024 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-09-11 13:36 |
By the way, this isn't caused by the present issue, but you'll notice that in the latest trunk, some tests in find_recursion_limit.py fail with an AttributeError instead of RuntimeError. This is likely caused by a PyDict_GetItem() call discarding the RuntimeError somewhere and returning NULL, which in turn raises an AttributeError (as I explained on the mailing-list). A quick way to fix it would be to use "except (AttributeError, RuntimeError)" in the testing func of find_recursion_limit.py. |
|
|
msg73058 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2008-09-11 21:05 |
Committed as r66390 and r66391. |
|
|
msg73132 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-09-12 20:07 |
I've opened #3850 for the find_recursion_limit problem. |
|
|
msg75974 - (view) |
Author: Tal Einat (taleinat) *  |
Date: 2008-11-17 16:55 |
Will this be back-ported to 2.5.3? (please say yes...) |
|
|