Issue 32307: Bad assumption on thread stack size makes python crash with musl libc (original) (raw)

Created on 2017-12-13 17:46 by Natanael Copa, last changed 2022-04-11 14:58 by admin.

Pull Requests
URL Status Linked Edit
PR 26224 open python-dev,2021-05-18 23:52
Messages (8)
msg308226 - (view) Author: Natanael Copa (Natanael Copa) Date: 2017-12-13 17:46
Python assumes that the system default thread stack size is big enough for python, except for OSX and FreeBSD where stack size is explicitly set. With musl libc the system thread stack size is only 80k, which leads to hard crash before `sys.getrecursionlimit()` is hit. See: https://github.com/python/cpython/blob/master/Python/thread_pthread.h#L22 Testcase: import threading import sys def f(n=0): try: print(n) f(n+1) except Exception: print("we hit recursion limit") sys.exit(0) t = threading.Thread(target=f) t.start() This can be pasted into: docker run --rm -it python:2.7-alpine docker run --rm -it python:3.6-alpine
msg308228 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-13 18:12
Well, from our point of view it isn't a bad assumption, it's that muslc needs to be added to the list of exceptions. (I know almost nothing about this...I assume there is some reason we can't determine the stack size programatically?) Would you care to propose a PR?
msg308293 - (view) Author: Natanael Copa (Natanael Copa) Date: 2017-12-14 12:52
With "bad assumption" I mean that currently the code assumes that all unknown systems has big enough default thread stack size. This would not be a "bad assumption" if POSIX (or any other standard) would specify a requirement on the default stack size. To my understanding, there are no such requirement in the spec, and thus, applications can not really assume anything about the default thread stack size. musl libc does not provide any __MUSL__ macro by design, because they consider "it’s a bug to assume a certain implementation has particular properties rather than testing" so we cannot really make exception for musl. https://wiki.musl-libc.org/faq.html#Q:_Why_is_there_no_<code>__MUSL___macro? I think the options we have are: - at run time, use pthread_attr_getstacksize() to verify we have enough stack size. I don't know how to calculate the proper minimum, but some testing showed that we need 450k on musl x86_64 to reach sys.getrecursionlimit() so I'd say 1MB is a good limit - atleast for x86_64. - at build time, run pthread_attr_getstacksize() from configure script to detect default stack size. This will not work for cross compile. - reverse the current #ifdef behavior so that we check for known systems that has big enough system default (eg __GLIBC__ etc) and then fall back to something safe for everything else. - Do nothing. Systems with low stack needs to figure out to add -DTHREAD_STACK_SIZE=0x100000 to the CFLAGS (this is what Alpine does right now) I can of course make a check for musl in the configure script and special case it, but it still does not fix the issue: There is no guarantee that stack size is big enough for python's use for every existing and unknown future system.
msg308294 - (view) Author: Natanael Copa (Natanael Copa) Date: 2017-12-14 13:09
I ran a test of pthread_attr_getstacksize() on various systems. Here is what they return: Linux/glibc: 8388608 MacOS: 524288 FreeBSD 11.1: 0 (ulimit -s gives 524288) NetBSD 7.1: 4194304 OpenBSD 6.2: 524288 I could not see any #ifdef for __OpenBSD__ so I would expect to see segfaults on OpenBSD as well. Google gave me this: https://bugs.python.org/issue25342 https://github.com/alonho/pytrace/issues/8
msg308304 - (view) Author: Natanael Copa (Natanael Copa) Date: 2017-12-14 15:10
I suggest a runtime check like this: diff --git a/Python/thread_pthread.h b/Python/thread_pthread.h index b7463c0ca6..e9d0f638c9 100644 --- a/Python/thread_pthread.h +++ b/Python/thread_pthread.h @@ -34,6 +34,8 @@ #undef THREAD_STACK_SIZE #define THREAD_STACK_SIZE 0x400000 #endif +/* minimum size of the default thread stack size */ +#define THREAD_STACK_MIN_DEFAULT 0x100000 /* for safety, ensure a viable minimum stacksize */ #define THREAD_STACK_MIN 0x8000 /* 32 KiB */ #else /* !_POSIX_THREAD_ATTR_STACKSIZE */ @@ -187,6 +189,14 @@ PyThread_start_new_thread(void (*func)(void *), void *arg) PyThreadState *tstate = PyThreadState_GET(); size_t stacksize = tstate ? tstate->interp->pythread_stacksize : 0; tss = (stacksize != 0) ? stacksize : THREAD_STACK_SIZE; + if (tss == 0 && THREAD_STACK_SIZE == 0) { + if (pthread_attr_getstacksize(&attrs, &tss) != 0) { + pthread_attr_destroy(&attrs); + return PYTHREAD_INVALID_THREAD_ID; + } + if (tss != 0 && tss < THREAD_STACK_MIN_DEFAULT) + tss = THREAD_STACK_MIN_DEFAULT; + } if (tss != 0) { if (pthread_attr_setstacksize(&attrs, tss) != 0) { pthread_attr_destroy(&attrs); The reasoning here is: - if THREAD_STACK_SIZE is set to non-zero, then use that. This is so that you can at compile time set the default stack size to your liking. This is currently set for __APPLE__ and __FreeBSD__ and you can set it via CFLAGS, so this does not change anything for those. - if THREAD_STACK_SIZE is set to 0, then see if pthread_attr_getstacksize returns anything and if it does, make sure that we have at least 1MB stack size as default. Scripts that uses threading.stack_size() will still work as before, and you can still manually set the stack size down to 32k (THREAD_STACK_MIN) if you need that. This should keep existing behavior for known systems like glibc, OSX, FreeBSD, NetBSD but will raise the thread stack size for musl libc and OpenBSD to 1MB. What do you think?
msg308305 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2017-12-14 15:19
I think we need people who do a lot of work at the C level to evaluate this, so I've added a couple to the nosy list :)
msg308307 - (view) Author: Natanael Copa (Natanael Copa) Date: 2017-12-14 15:31
Exactly same as previous patch, but refactored to only have a single pthread_attr_destroy() call instead of 3. diff --git a/Python/thread_pthread.h b/Python/thread_pthread.h index b7463c0ca6..1006a168fa 100644 --- a/Python/thread_pthread.h +++ b/Python/thread_pthread.h @@ -34,6 +34,8 @@ #undef THREAD_STACK_SIZE #define THREAD_STACK_SIZE 0x400000 #endif +/* minimum size of the default thread stack size */ +#define THREAD_STACK_MIN_DEFAULT 0x100000 /* for safety, ensure a viable minimum stacksize */ #define THREAD_STACK_MIN 0x8000 /* 32 KiB */ #else /* !_POSIX_THREAD_ATTR_STACKSIZE */ @@ -167,7 +169,7 @@ unsigned long PyThread_start_new_thread(void (*func)(void *), void *arg) { pthread_t th; - int status; + int status = -1; #if defined(THREAD_STACK_SIZE) | defined(PTHREAD_SYSTEM_SCHED_SUPPORTED) pthread_attr_t attrs; #endif @@ -187,11 +189,15 @@ PyThread_start_new_thread(void (*func)(void *), void *arg) PyThreadState *tstate = PyThreadState_GET(); size_t stacksize = tstate ? tstate->interp->pythread_stacksize : 0; tss = (stacksize != 0) ? stacksize : THREAD_STACK_SIZE; + if (tss == 0 && THREAD_STACK_SIZE == 0) { + if (pthread_attr_getstacksize(&attrs, &tss) != 0) + goto thread_abort; + if (tss != 0 && tss < THREAD_STACK_MIN_DEFAULT) + tss = THREAD_STACK_MIN_DEFAULT; + } if (tss != 0) { - if (pthread_attr_setstacksize(&attrs, tss) != 0) { - pthread_attr_destroy(&attrs); - return PYTHREAD_INVALID_THREAD_ID; - } + if (pthread_attr_setstacksize(&attrs, tss) != 0) + goto thread_abort; } #endif #if defined(PTHREAD_SYSTEM_SCHED_SUPPORTED) @@ -209,6 +215,7 @@ PyThread_start_new_thread(void (*func)(void *), void *arg) ); #if defined(THREAD_STACK_SIZE)
msg403184 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-10-04 22:00
I left some comments on the PR. We can at least make it detect musl libc and improve its default behavior there. FWIW CPython's recursionlimit and the C stack have long been a thorn in the side of many things. Even when Python's thread_pthread.h is changed to set a better default when running under musl libc, there are still plenty of ways to run into problems. ex: Extension modules can create threads on their own that then call back into CPython. So can C/C++ code that embeds a Python interpreter. Those choose their own stack sizes. When they choose low values, surprising crashes ensue for people working on the Python side... As a feature (beyond just this issue): It'd be ideal to actually be able to introspect the C stack space remaining and issue a RecursionError whenever it falls below a threshold. Rather than just blindly using a runtime adjustable number that any Python code can tweak to allow crashes via `sys.setrecursionlimit()`. Other languages like Golang use a dynamic stack and allocate more on SIGSEGV. [meta/off-topic: Why does anyone use small thread stack sizes? This is often done to save virtual address space in applications that expect to have thousands of threads. It matters a lot less on 64-bit address space. But there are reasons...]
History
Date User Action Args
2022-04-11 14:58:55 admin set github: 76488
2021-10-04 22:00:28 gregory.p.smith set assignee: gregory.p.smithmessages: + versions: + Python 3.9, Python 3.10, Python 3.11, - Python 2.7, Python 3.6
2021-10-04 21:32:32 gregory.p.smith set nosy: + gregory.p.smith
2021-10-01 12:16:16 4-launchpad-kalvdans-no-ip-org set nosy: + 4-launchpad-kalvdans-no-ip-org
2021-05-18 23:52:28 python-dev set keywords: + patchnosy: + python-devpull_requests: + <pull%5Frequest24841>stage: patch review
2017-12-14 15:31:35 Natanael Copa set nosy: - vstinner, serhiy.storchakamessages: +
2017-12-14 15:19:03 r.david.murray set nosy: + vstinner, serhiy.storchakamessages: +
2017-12-14 15:10:55 Natanael Copa set messages: +
2017-12-14 13:09:50 Natanael Copa set messages: +
2017-12-14 12:52:39 Natanael Copa set messages: +
2017-12-13 18:12:56 r.david.murray set nosy: + r.david.murraymessages: +
2017-12-13 17:46:34 Natanael Copa create