test_register_chain fails on aarch64 due to signal stack overflow, when re-raising the signal in faulthandler_user. The problem is that the signal stack can only handle a single signal frame, but faulthandler_user adds a second one. _Py_Faulthandler_Init should allocate twice the amount of stack to cater for the two signal frames. ====================================================================== FAIL: test_register_chain (test.test_faulthandler.FaultHandlerTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/abuild/rpmbuild/BUILD/Python-3.4.1/Lib/test/test_faulthandler.py", line 592, in test_register_chain self.check_register(chain=True) File "/home/abuild/rpmbuild/BUILD/Python-3.4.1/Lib/test/test_faulthandler.py", line 576, in check_register self.assertEqual(exitcode, 0) AssertionError: -11 != 0 ----------------------------------------------------------------------
_PyFaulthandler_Init() uses sigaltstack() with a stack of SIGSTKSZ bytes. On my Linux/x86_64, SIGSTKSZ is 8 KB. What is the value of SIGSTKSZ on aarch64? Is there a C define (#ifdef) to use a different size on this architecture? Does the test pass if you modify faulthandler.c to use "SIGSTKSZ * 2"?
There is an open bug about MINSIGSTKSZ being too small on aarch64 <https://sourceware.org/bugzilla/show_bug.cgi?id=16850>. How much SIGSTKSZ can guarantee about nested signals is unclear. POSIX does not appear give any guidance. On aarch64 SIGSTKSZ is defined to 8192, which is the default for architectures not overriding it (both in glibc and the kernel headers).
The bug ticket link provided by @schwab was resolved as closed in 2015. Is this ticket still an issue on aarch64? Other tickets with same error on other platforms: Issue35484, Issue21131
Python 3 is built frequently on the Fedora infra on AArch64 and the test_faulthandler test doesn't fail there. Recent example of build: https://koji.fedoraproject.org/koji/buildinfo?buildID=1236594 Direct link to AArch64 build logs (build.log): https://kojipkgs.fedoraproject.org//packages/python3/3.7.2/5.fc29/data/logs/aarch64/build.log Extract: 0:02:37 load avg: 5.99 [177/414] test_faulthandler passed (41 sec 925 ms) -- running: test_concurrent_futures (1 min 30 sec) ... 0:03:10 load avg: 10.21 [190/414] test_faulthandler passed (1 min 2 sec) -- running: test_gdb (44 sec 52 ms), test_concurrent_futures (1 min 56 sec) The test is run on a Python compiled in release mode then on a Python compiled in debug mode. The test pass in both cases. I close the issue. It seems like the bug has been fixed indirectly since this bug has been reported. Thanks for your bug report Andreas Schwab :-)
History
Date
User
Action
Args
2022-04-11 14:58:08
admin
set
github: 66693
2019-03-26 12:55:39
vstinner
set
status: open -> closedresolution: fixedmessages: + stage: resolved