Issue 7332: python script segment fault at PyMarshal_ReadLastObjectFromFile in import_submodule (original) (raw)

Created on 2009-11-16 07:50 by liang, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (28)

msg95325 - (view)

Author: liang (liang)

Date: 2009-11-16 07:50

In our testbed,we have seem serveral sgement fault in our python scrit. The enviroment is: linux=2.6.29.6-0.6.smp.gcc4.1.x86_64 python=2.4.4-41.4-1 GCC = GCC 4.1.2 20070626 (rPath Inc.)] on linux2 Below are the detail call stack: (gdb) bt #0 PyMarshal_ReadLastObjectFromFile (fp=0x73a550) at Python/marshal.c:748 #1 0x000000000047bbf9 in read_compiled_module (cpathname=0x7fff184ba600 "/usr/lib64/python2.4/sre_constants.pyc", fp=0x73a550) at Python/import.c:728 #2 0x000000000047da2c in load_source_module (name=0x7fff184bc740 "sre_constants", pathname=0x7fff184bb680 "/usr/lib64/python2.4/sre_constants.py", fp=0x737df0) at Python/import.c:896 #3 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff184bc740 "sre_constants", fullname=0x7fff184bc740 "sre_constants") at Python/import.c:2276 #4 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff184bc740 "sre_constants", p_buflen=0x7fff184bc73c) at Python/import.c:2096 #5 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff18bac298 "\001", globals=0x7fff18bac2bc, locals=, fromlist=0x7fff18c90990) at Python/import.c:1931 #6 0x000000000045f963 in builtin___import__ (self=, args=) at Python/bltinmodule.c:45 #7 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550, kw=0x46e829e3) at Objects/abstract.c:1795 #8 0x00000000004628fd in PyEval_CallObjectWithKeywords (func=0x7fff18ca5440, arg=0x7fff18c944c8, kw=0x0) at Python/ceval.c:3435 #9 0x000000000046461a in PyEval_EvalFrame (f=0x744650) at Python/ceval.c:2020 #10 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c95ab0, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741 #11 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, globals=0x73a550, locals=0x46e829e3) at Python/ceval.c:484 #12 0x000000000047d29a in PyImport_ExecCodeModuleEx (name=0x7fff184bfce0 "sre_compile", co=0x7fff18c95ab0, pathname=0x7fff184bdba0 "/usr/lib64/python2.4/sre_compile.pyc") at Python/import.c:636 #13 0x000000000047d7d0 in load_source_module (name=0x7fff184bfce0 "sre_compile", pathname=0x7fff184bdba0 "/usr/lib64/python2.4/sre_compile.pyc", fp=) at Python/import.c:915 #14 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff184bfce0 "sre_compile", fullname=0x7fff184bfce0 "sre_compile") at Python/import.c:2276 #15 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff184bfce0 "sre_compile", p_buflen=0x7fff184bfcdc) at Python/import.c:2096 #16 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff18c8fbd0 "\001", globals=0x7fff18c8fbf4, locals=, fromlist=0x6ea570) at Python/import.c:1931 #17 0x000000000045f963 in builtin___import__ (self=, args=) at Python/bltinmodule.c:45 #18 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550, kw=0x46e829e3) at Objects/abstract.c:1795 #19 0x00000000004628fd in PyEval_CallObjectWithKeywords (func=0x7fff18ca5440, arg=0x7fff18c94208, kw=0x0) at Python/ceval.c:3435 #20 0x000000000046461a in PyEval_EvalFrame (f=0x7b6680) at Python/ceval.c:2020 #21 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c95500, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741 #22 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, globals=0x73a550, locals=0x46e829e3) at Python/ceval.c:484 #23 0x000000000047d29a in PyImport_ExecCodeModuleEx (name=0x7fff184c3280 "sre", co=0x7fff18c95500, pathname=0x7fff184c1140 "/usr/lib64/python2.4/sre.pyc") at Python/import.c:636 #24 0x000000000047d7d0 in load_source_module (name=0x7fff184c3280 "sre", pathname=0x7fff184c1140 "/usr/lib64/python2.4/sre.pyc", fp=) at Python/import.c:915 #25 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff184c3280 "sre", fullname=0x7fff184c3280 "sre") at Python/import.c:2276 #26 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff184c3280 "sre", p_buflen=0x7fff184c327c) at Python/import.c:2096 #27 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff18c8cc90 "\001", globals=0x7fff18c8ccb4, locals=, fromlist=0x7fff18c90450) at Python/import.c:1931 #28 0x000000000045f963 in builtin___import__ (self=, args=) at Python/bltinmodule.c:45 #29 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550, kw=0x46e829e3) at Objects/abstract.c:1795 #30 0x00000000004628fd in PyEval_CallObjectWithKeywords (func=0x7fff18ca5440, arg=0x7fff18c83788, kw=0x0) at Python/ceval.c:3435 #31 0x000000000046461a in PyEval_EvalFrame (f=0x753bb0) at Python/ceval.c:2020 #32 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c8a7a0, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741 #33 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, globals=0x73a550, locals=0x46e829e3) at Python/ceval.c:484 #34 0x000000000047d29a in PyImport_ExecCodeModuleEx (name=0x7fff184c6820 "re", co=0x7fff18c8a7a0, pathname=0x7fff184c46e0 "/usr/lib64/python2.4/re.pyc") at Python/import.c:636 #35 0x000000000047d7d0 in load_source_module (name=0x7fff184c6820 "re", pathname=0x7fff184c46e0 "/usr/lib64/python2.4/re.pyc", fp=) at Python/import.c:915 #36 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff184c6820 "re", fullname=0x7fff184c6820 "re") at Python/import.c:2276 #37 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff184c6820 "re", p_buflen=0x7fff184c681c) at Python/import.c:2096 #38 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff18c8ca50 "\032", globals=0x7fff18c8ca74, locals=, fromlist=0x6ea570) at Python/import.c:1931 #39 0x000000000045f963 in builtin___import__ (self=, args=) at Python/bltinmodule.c:45 #40 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550, kw=0x46e829e3) at Objects/abstract.c:1795 #41 0x00000000004628fd in PyEval_CallObjectWithKeywords (func=0x7fff18ca5440, arg=0x7fff18c83680, kw=0x0) at Python/ceval.c:3435 #42 0x000000000046461a in PyEval_EvalFrame (f=0x7932d0) at Python/ceval.c:2020 #43 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18c8a730, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741 #44 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, globals=0x73a550, locals=0x46e829e3) at Python/ceval.c:484 #45 0x000000000047d29a in PyImport_ExecCodeModuleEx (name=0x7fff184c9dc0 "difflib", co=0x7fff18c8a730, pathname=0x7fff184c7c80 "/usr/lib64/python2.4/difflib.pyc") at Python/import.c:636 #46 0x000000000047d7d0 in load_source_module (name=0x7fff184c9dc0 "difflib", pathname=0x7fff184c7c80 "/usr/lib64/python2.4/difflib.pyc", fp=) at Python/import.c:915 #47 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff184c9dc0 "difflib", fullname=0x7fff184c9dc0 "difflib") at Python/import.c:2276 #48 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff184c9dc0 "difflib", p_buflen=0x7fff184c9dbc) at Python/import.c:2096 #49 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff18cb9300 "\001", globals=0x7fff18cb9324, locals=, fromlist=0x6ea570) at Python/import.c:1931 #50 0x000000000045f963 in builtin___import__ (self=, args=) at Python/bltinmodule.c:45 #51 0x00000000004148e0 in PyObject_Call (func=0x73a550, arg=0x73a550, kw=0x46e829e3) at Objects/abstract.c:1795 #52 0x00000000004628fd in PyEval_CallObjectWithKeywords (func=0x7fff18ca5440, arg=0x7fff18c810a8, kw=0x0) at Python/ceval.c:3435 #53 0x000000000046461a in PyEval_EvalFrame (f=0x7921c0) at Python/ceval.c:2020 #54 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff18623490, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741 #55 0x0000000000468d92 in PyEval_EvalCode (co=0x73a550, globals=0x73a550, locals=0x46e829e3) at Python/ceval.c:484 #56 0x00000000004853d9 in run_node (n=, filename=, globals=0x718650, locals=0x718650, flags=) at Python/pythonrun.c:1285 #57 0x00000000004868b8 in PyRun_SimpleFileExFlags (fp=, filename=0x7fff184ccbcc "/usr/local/maui/ganglia/lib/ganglia/python_modules/maui_svc.py", closeit=1, flags=0x7fff184cb350) at Python/pythonrun.c:869 #58 0x000000000041168d in Py_Main (argc=, argv=0x7fff184cb478) at Modules/main.c:493 #59 0x00007fff177f48a4 in __libc_start_main () from /lib64/libc.so.6 #60 0x0000000000410a59 in _start () Segment fault when it try to load sre_constants.pyc.

Another stack:

#0 PyMarshal_ReadLastObjectFromFile (fp=0x7f33f0) at Python/marshal.c:748 #1 0x000000000047bbf9 in read_compiled_module (cpathname=0x7fff069fe830 "/usr/lib64/python2.4/inspect.pyc", fp=0x7f33f0) at Python/import.c:728 #2 0x000000000047da2c in load_source_module (name=0x7fff06a00970 "inspect", pathname=0x7fff069ff8b0 "/usr/lib64/python2.4/inspect.py", fp=0x7d97d0) at Python/import.c:896 #3 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff06a00970 "inspect", fullname=0x7fff06a00970 "inspect") at Python/import.c:2276 #4 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff06a00970 "inspect", p_buflen=0x7fff06a0096c) at Python/import.c:2096

Segment fault when it try to load inspect.pyc.

Another core at: (gdb) bt #0 PyMarshal_ReadLastObjectFromFile (fp=0x7dd190) at Python/marshal.c:748 #1 0x000000000047bbf9 in read_compiled_module (cpathname=0x7fff1bc03de0 "/usr/lib64/python2.4/string.pyc", fp=0x7dd190) at Python/import.c:728 #2 0x000000000047da2c in load_source_module (name=0x7fff1bc05f20 "string", pathname=0x7fff1bc04e60 "/usr/lib64/python2.4/string.py", fp=0x7dc6f0) at Python/import.c:896 #3 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff1bc05f20 "string", fullname=0x7fff1bc05f20 "string") at Python/import.c:2276 #4 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff1bc05f20 "string", p_buflen=0x7fff1bc05f1c) at Python/import.c:2096 #5 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff1c6694b0 "\001", globals=0x7fff1c6694d4, locals=, fromlist=0x6ea570) at Python/import.c:1931 #6 0x000000000045f963 in builtin___import__ (self=, args=) at Python/bltinmodule.c:45 #7 0x00000000004148e0 in PyObject_Call (func=0x7dd190, arg=0x7dd190, kw=0x46e829e3) at Objects/abstract.c:1795 #8 0x00000000004628fd in PyEval_CallObjectWithKeywords (func=0x7fff1c741440, arg=0x7fff1c663890, kw=0x0) at Python/ceval.c:3435 #9 0x000000000046461a in PyEval_EvalFrame (f=0x744650) at Python/ceval.c:2020 #10 0x0000000000468ce0 in PyEval_EvalCodeEx (co=0x7fff1c66a8f0, globals=, locals=, args=0x0, argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:2741 #11 0x0000000000468d92 in PyEval_EvalCode (co=0x7dd190, globals=0x7dd190, locals=0x46e829e3) at Python/ceval.c:484 #12 0x000000000047d29a in PyImport_ExecCodeModuleEx (name=0x7fff1bc094c0 "inspect", co=0x7fff1c66a8f0, pathname=0x7fff1bc07380 "/usr/lib64/python2.4/inspect.pyc") at Python/import.c:636 #13 0x000000000047d7d0 in load_source_module (name=0x7fff1bc094c0 "inspect", pathname=0x7fff1bc07380 "/usr/lib64/python2.4/inspect.pyc", fp=) at Python/import.c:915 #14 0x000000000047e7bd in import_submodule (mod=0x6ea570, subname=0x7fff1bc094c0 "inspect", fullname=0x7fff1bc094c0 "inspect") at Python/import.c:2276 #15 0x000000000047ec3c in load_next (mod=0x6ea570, altmod=0x6ea570, p_name=, buf=0x7fff1bc094c0 "inspect", p_buflen=0x7fff1bc094bc) at Python/import.c:2096 #16 0x000000000047ee47 in PyImport_ImportModuleEx (name=0x7fff1c65dba0 "\002", globals=0x7fff1c65dbc4, locals=, fromlist=0x6ea570) at Python/import.c:1931

Segment fault when it try to load string.pyc.

We have seen it several times.However,the script is long running and we can not sure how it happened and how to make it reproduce.

Does anyone have any ideas on this?

msg99428 - (view)

Author: resc (Thomas.Smith)

Date: 2010-02-16 18:03

I'm also getting segfaults in PyMarshal_ReadLastObjectFromFile in Python 2.6.2 (on Ubuntu Jaunty). It's very sporadic, I've been reproducing it by running a minimal script 100,000 times, and getting a few core dumps. There are several Ubuntu bugreports in various packages that use Python:

https://bugs.launchpad.net/ubuntu/+source/apport/+bug/393022 https://bugs.launchpad.net/ubuntu/+source/gnome-python/+bug/432546 https://bugs.launchpad.net/ubuntu/+source/streamtuner/+bug/336331

I've attached a zip file with my test scripts and some gdb backtraces. I am happy to spend time on this bug, although I only have a rudimentary knowledge of C, so I'd mainly be useful for testing.

The computer I'm having trouble on is a Dell PowerEdge T410, with a Xeon E5502, and it had another sporadic segfault problem in a should-be-reliable program, ImageMagick. Switching to GraphicsMagick fixed that one, somehow. If it's a hardware-specific bug, Python is the only program that's tickling it right now...

msg103370 - (view)

Author: Charles-François Natali (neologix) * (Python committer)

Date: 2010-04-16 21:58

It's definitely a stack overflow. Most of the backtraces show an important number of frames. The last frame is this: #0 PyMarshal_ReadLastObjectFromFile (fp=0x13e8200) at ../Python/marshal.c:1026 filesize =

and a disassembly show us that the segfault is generated on a callq: 0x4bd4d6 <PyMarshal_ReadLastObjectFromFile+54>: callq 0x4168e8 <fileno@plt>

And if you look at the code, it's obvious what's happening: PyObject * PyMarshal_ReadLastObjectFromFile(FILE fp) { / 75% of 2.1's .pyc files can exploit SMALL_FILE_LIMIT.

#define SMALL_FILE_LIMIT (1L << 14) #define REASONABLE_FILE_LIMIT (1L << 18) #ifdef HAVE_FSTAT off_t filesize; #endif #ifdef HAVE_FSTAT filesize = getfilesize(fp); if (filesize > 0) { char buf[SMALL_FILE_LIMIT]; char* pBuf = NULL; if (filesize <= SMALL_FILE_LIMIT) pBuf = buf; else if (filesize <= REASONABLE_FILE_LIMIT) pBuf = (char *)PyMem_MALLOC(filesize); if (pBuf != NULL) { [...] }

SMALL_FILE_LIMIT is 1 << 14 which is roughly 16K (not that reasonable :-). So when we enter PyMarshal_ReadLastObjectFromFile and allocate buf, we push around 16K on the stack, which is a lot. That's why we segfault soon after when we call a function (callq), there's no space left on the stack. So there are several solutions:

Peers ?

msg103405 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-17 14:50

I agree that we can consider dropping the static buffer and always using PyMem_MALLOC(). It looks a bit strange for this bug to happen, though. Does Ubuntu use a small stack size?

msg103406 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-17 15:25

Oh, and the record of the original patch conversation (when this optimization was added) can be found here: http://mail.python.org/pipermail/patches/2001-January/003500.html

msg103407 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-17 15:44

A small benchmark shows no difference in startup time when disabling the stack buffer. (this is on Linux: of course, the problem might be that the glibc is heavily optimized)

The benchmark was a simple: $ time ./python -E -c "import logging, pydoc, xmlrpclib, urllib, urllib2, unittest, doctest, profile, smtplib, httplib, fractions, decimal, codecs, difflib, argparse, distutils, email, imaplib, idlelib, json, _pyio, poplib, ftplib"

msg103408 - (view)

Author: Charles-François Natali (neologix) * (Python committer)

Date: 2010-04-17 16:35

It looks a bit strange for this bug to happen, though. Does Ubuntu use a small stack size?

There are other possible reasons:

Since I don't have an Ubuntu box, it would be nice if one of the reporters could:

A small benchmark shows no difference in startup time when disabling the stack buffer. (this is on Linux: of course, the problem might be that the glibc is heavily optimized)

Yeap, there as some crappy systems out there (no name :-), that's why it would be nice to have some feedback and small benchmarks on various platforms. Anyway, even if compiled files are small most of the time, I'm not sure that this "let's copy the file to the stack/heap" approcah is optimal, and maybe mmap would be worth considering if we find that the overhead is not negligible (I haven't looked at the code in detail, so maybe it's not possible to use in this case).

msg103417 - (view)

Author: Charles-François Natali (neologix) * (Python committer)

Date: 2010-04-17 18:11

Ok, I've done too some trivial benchmarking on my Linux box, and I get this: right now: $ time ./python /tmp/test_import.py real 0m1.258s user 0m1.111s sys 0m0.101s

with mmap: $ time ./python /tmp/test_import.py real 0m1.262s user 0m1.170s sys 0m0.090s

with malloc only: $ time ./python /tmp/test_import.py real 0m1.213s user 0m1.111s sys 0m0.099s

The test script just imports every module available. So I'd agree with Antoine, and think we should just use malloc. The attached patch marshal_stack.diff just does that.

msg103702 - (view)

Author: Matthias Klose (doko) * (Python committer)

Date: 2010-04-20 12:58

Does Ubuntu use a small stack size?

it's 8192 on all architectures.

msg103703 - (view)

Author: Matthias Klose (doko) * (Python committer)

Date: 2010-04-20 13:05

I'm told it's 10240 on Fedora 12, x86 and x86_64

msg103704 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2010-04-20 13:11

Allocate more than 16 bytes on the stack is never a good idea. Eg. Linux does never resize the size automatically, and the only way to catch "allocatation failed" error is to handle the SIGSEGV signal...

Remove buf allocated on the stack by a buffer allocated on the heap is definitly a good ide :-)

msg103705 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-20 13:14

A 16KB stack buffer is tiny compared to a 8MB stack. I'm not sure removing that buffer would really fix the problems. Perhaps other threads get a smaller stack?

msg103707 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-20 13:43

What's the value of MAXPATHLEN and PATH_MAX on those systems?

msg103708 - (view)

Author: Charles-François Natali (neologix) * (Python committer)

Date: 2010-04-20 13:46

The problem is highlighted with recursive imports: a module which imports another module, which imports another module, etc. PyMarshal_ReadLastObjectFromFile is not the only function to use stack-allocated buffers, there are also load_source_module, load_package, import_module_level, which use char buf[MAXPATHLEN+1]: with a MAXPATHLEN to 1024, you lose 2 or 3K every time you do a recursive import. And, as has been said, it might very well happen that new threads get a reduced stack size.

msg103710 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-20 14:00

The problem is highlighted with recursive imports: a module which imports another module, which imports another module, etc. PyMarshal_ReadLastObjectFromFile is not the only function to use stack-allocated buffers, there are also load_source_module, load_package, import_module_level, which use char buf[MAXPATHLEN+1]: with a MAXPATHLEN to 1024, you lose 2 or 3K every time you do a recursive import.

Let's assume we lose ten times 1024 bytes, that's still only 10KB. The stack is 8MB. We are argueing about less than 1% of the total stack size.

I just went through all of the functions highlighted in one of these stack traces (*). The only big consumers of stack space seem to be the stack buffer in PyMarshal_ReadLastObjectFromFile, and the various file path buffers using MAXPATHLEN.

(*) https://bugs.launchpad.net/ubuntu/+source/python2.6/+bug/432546

And that report shows only a single thread, so I have to assume that the 8MB figure applies there.

Nevertheless, we can remove the stack buffer since it's probably useless. It just seems unlikely to me to be the root cause of the stack overflow.

msg103716 - (view)

Author: resc (Thomas.Smith)

Date: 2010-04-20 14:25

Hi, I'm working on reproducing this again, but it's always been a very sporadic bug, and I haven't gotten a bingo yet.

I wish I had a test case that would trigger the bug more reliably... -Thomas

msg103719 - (view)

Author: Matthias Klose (doko) * (Python committer)

Date: 2010-04-20 14:28

PATH_MAX/MAXPATHLEN is 4096

msg103812 - (view)

Author: Charles-François Natali (neologix) * (Python committer)

Date: 2010-04-21 10:19

And that report shows only a single thread, so I have to assume that the 8MB figure applies there.

Nevertheless, we can remove the stack buffer since it's probably useless. It just seems unlikely to me to be the root cause of the stack overflow.

If we really have an 8MB stack, yes, it's unlikely. But max stack size is inherited by child processes, and see for example streamtuner (one of the reports): http://bugs.gentoo.org/274056

--- src/streamtuner/st-thread.c +++ src/streamtuner/st-thread.c @@ -108,1 +108,1 @@ - 0x18000, /* 96k, big enough for libcurl / + 0x40000, / change from 96k to 256k */

So if we start with this stack size, we can run out of stack space really easily: I counted around 20 bufs allocation in some backtraces, and with MAXPATHLEN to 4K, it's 20 * 4 + 16 = 96K used.

There might be another reason. I think that Ubuntu's using gcc SSP feature by default, to prevent buffer overflows and friends, so maybe there's something going on with this. That would explain why it's only reported on Ubuntu (well, they also have more users, but let's assume there's really something specific on Ubuntu).

I'm also getting segfaults in PyMarshal_ReadLastObjectFromFile in Python 2.6.2 (on Ubuntu Jaunty). It's very sporadic, I've been reproducing it by running a minimal script 100,000 times, and getting a few core dumps.

I've had a look at your backtraces, and when it segfaults, the stack size is really far from 8M. So there's realy somthing fishy going on here. Are you getting an error message printed beside the usual segmentation fault ? Could you try to reproduce with your test script with a python compiled with -fno-stack-protector and -U_FORTIFY_SOURCE ?

msg103817 - (view)

Author: Matthias Klose (doko) * (Python committer)

Date: 2010-04-21 10:57

That would explain why it's only reported on Ubuntu

the original report is from the rPath distribution.

msg103819 - (view)

Author: Charles-François Natali (neologix) * (Python committer)

Date: 2010-04-21 11:22

the original report is from the rPath distribution.

Never heard of this one, but http://wiki.rpath.com/wiki/rPath_Linux:rPath_Linux_2 states:

Compile with --fstack-protectorand FORTIFY_SOURCE=2 (override in your recipes by modifying the securityflags Conary macro), link with GNU hash and -O1, and use -fPIE for some key executables.

msg103885 - (view)

Author: Kees Cook (keescook)

Date: 2010-04-21 18:44

The stack protector will add 8 (aligned, so possibly padded) bytes to each stack frame of functions with arrays of 8 or greater bytes. So if things are marginal, this could make the difference between Pythons compiled with/without -fstack-protector.

N.B. if rPath is compiled with -D_FORTIFY_SOURCE=2 and -O1, then -D_FORTIFY_SOURCE=2 has no effect (it is only activated at -O2 or higher).

Details on Ubuntu's compiler flag defaults: https://wiki.ubuntu.com/CompilerFlags

Putting MAXPATH on the stack certainly seems like a big waste of space, though. :)

msg103912 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2010-04-21 21:17

Here is a short shell script to reproduce the stack overflow:

The stack starts with 86016 bytes and it crashs at import depth 6.

I don't know if my script is realistic (128 KB stack), but at least it shows a crash.

I think that most programs crash with small stack.

msg103916 - (view)

Author: Kees Cook (keescook)

Date: 2010-04-21 22:32

So, digging a little further, I think this is a now-fixed kernel bug with stack growth. There were known issues prior to Sep 2009 with 64bit stack growth with ASLR, which is enabled by default. Upstream fix:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=80938332d8cf652f6b16e0788cf0ca136befe0b5

This was fixed in stable releases of the Ubuntu kernels on Mar 16, 2010 (though the fix was included in Ubuntu 9.10 when it was released Oct 29, 2009).

The Launchpad bugs 432546 and 393022 were both filed prior to these kernel fixes, and show an un-maximized stack segment that has bumped up against the next-lower segment, which is how this kernel bug was manifesting. (See their attached ProcMaps.txt files.)

I don't believe this is a Python bug, and I think the issue is solved for any distro that contains the above kernel fix.

msg103920 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-21 22:48

Thank you Kees, this sounds quite likely. I will still commit the patch to remove the stack buffer, and then close this issue.

msg103921 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2010-04-21 23:01

Patch committed in trunk (r80325) and py3k (r80326). I won't backport it to 2.6/3.1 since it's not likely to fix anything in practice -- it's just a nice simplification. Thanks everyone for comments and patches.

msg103927 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2010-04-21 23:59

I tried to limit memory allocated on the stack while importing modules. Number of bytes allocated on the stack:

I guess that it will not fix the issue, only report the crash to another function.

I'm attaching the patch to this issue only to keep a copy of it. The patch is complex and there is no good reason to commit it since the problem doesn't come from Python.

The patch allocates filename buffers on the heap in import.c, zipimport.c and marshal.c.

msg103962 - (view)

Author: resc (Thomas.Smith)

Date: 2010-04-22 13:17

This was fixed in stable releases of the Ubuntu kernels on Mar 16, 2010 (though the fix was included in Ubuntu 9.10 when it was released Oct 29, 2009).

msg103965 - (view)

Author: resc (Thomas.Smith)

Date: 2010-04-22 13:21

Argh, that e-mail didn't work. Anyway, I just wanted to say that the kernel explanation is consistent with my experience, I had a crash every week up until recently, when I upgraded, but in the past few days I haven't been able to reproduce it.

History

Date

User

Action

Args

2022-04-11 14:56:54

admin

set

github: 51581

2010-04-29 17:50:49

mark.dickinson

link

issue770280 superseder

2010-04-22 13:21:58

Thomas.Smith

set

messages: +

2010-04-22 13:17:58

Thomas.Smith

set

messages: +

2010-04-21 23:59:37

vstinner

set

files: + import_nostack_alloc.patch

messages: +

2010-04-21 23:01:16

pitrou

set

status: open -> closed
versions: - Python 2.6, Python 3.1
messages: +

resolution: works for me
stage: needs patch -> resolved

2010-04-21 22:48:18

pitrou

set

messages: +

2010-04-21 22:32:10

keescook

set

messages: +

2010-04-21 21:17:08

vstinner

set

files: + import_stackoverflow.sh

messages: +

2010-04-21 18:44:04

keescook

set

nosy: + keescook
messages: +

2010-04-21 11:22:47

neologix

set

messages: +

2010-04-21 10:57:37

doko

set

messages: +

2010-04-21 10:19:11

neologix

set

messages: +

2010-04-20 14:28:43

doko

set

messages: +

2010-04-20 14:25:09

Thomas.Smith

set

messages: +

2010-04-20 14:00:08

pitrou

set

messages: +

2010-04-20 13:46:59

neologix

set

messages: +

2010-04-20 13:43:58

pitrou

set

messages: +

2010-04-20 13:14:44

pitrou

set

messages: +

2010-04-20 13:11:10

vstinner

set

nosy: + vstinner
messages: +

2010-04-20 13:05:03

doko

set

messages: +

2010-04-20 12:58:54

doko

set

nosy: + doko
messages: +

2010-04-17 18:11:37

neologix

set

files: - marshal_stack.diff

2010-04-17 18:11:23

neologix

set

files: + marshal_stack.diff

messages: +

2010-04-17 16:35:28

neologix

set

files: + marshal_stack.diff
keywords: + patch
messages: +

2010-04-17 15:57:54

dmalcolm

set

nosy: + dmalcolm

2010-04-17 15:44:02

pitrou

set

messages: +

2010-04-17 15:25:44

pitrou

set

priority: normal -> high

messages: +
versions: + Python 3.1, Python 2.7, Python 3.2

2010-04-17 14:50:22

pitrou

set

nosy: + pitrou, tim.peters
messages: +

2010-04-17 08:45:10

neologix

set

nosy: + ezio.melotti

2010-04-16 21:59:01

neologix

set

nosy: + neologix
messages: +

2010-02-16 23:44:48

ezio.melotti

set

priority: normal
stage: needs patch
versions: - Python 2.4

2010-02-16 18:03:11

Thomas.Smith

set

files: + traces.zip
versions: + Python 2.6
nosy: + Thomas.Smith

messages: +

2009-11-16 07:50:32

liang

create