Rust's main-stack thread guard is implemented incorrectly; breaks nearly all Rust programs on 64KB-pagesize post-stack-clash Linux (original) (raw)

Rust implements its own userspace "stack guard", the purpose is to print a nice error message about "stack overflow" rather than segfault. This is applied to each new thread, as well as the main thread:

https://github.com/rust-lang/rust/blob/master/src/libstd/rt.rs#L44
https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/thread.rs#L248
https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/thread.rs#L229

The start address of the stack is calculated effectively via pthread_attr_getstack (pthread_getattr_np (pthread_self))

This is where the problems occur. pthread_getattr_np is not defined by POSIX and the manual page does not specifically define what the exact behaviour is when getting the attributes of the main thread. It works fine for non-main threads because they are usually created with a fixed-sized stack which does not automatically expand. So the start address is pre-defined on those threads.

However on Linux (and other systems) the real size of the main stack is not determined in advance; it starts off small and then gets automatically expanded via the process described in great detail in recent articles on the stack-clash bug. In practise, with glibc the above series of function calls returns top-of-stack - stack-rlimit when called on the main thread. This is 8MB by default on most machines I've seen.

However, most of the space in between is not allocated at the start of the program. For example, a test "Hello World" Rust program has this (after init):

3fffff800000-3fffff810000 ---p 00000000 00:00 0 
3ffffffd0000-400000000000 rw-p 00000000 00:00 0                          [stack]

with ulimit -s unlimited it looks like this:

1000002d0000-1000002e0000 ---p 00000000 00:00 0 
3ffffffd0000-400000000000 rw-p 00000000 00:00 0                          [stack]

OTOH, the Linux stack guard is not a physical guard page but just extra logic that prevents the stack from growing too close to another mmap allocation. If I understand correctly: Contrary to the get_stack_start function, it does not work based on the stack rlimit, because this could be unlimited. Instead, it works based on the real size of the existing allocated stack. The guard then ensures that the next-highest mapped page remains more than stack_guard_gap below the lowest stack address, and if not then it will trigger a segfault.

On ppc64el Debian and other systems (Fedora aarch64, Fedora ppc64be, etc) the page size is 64KB. Previously, stack_guard_gap was equal to PAGESIZE. Now, it is 256 * PAGESIZE = 16MB, compared to the default stack size limit of 8MB. So now when Linux tries to expand the stack, it sees that the stack is only (8MB - $existing-size) away from the next-highest mmapped page (Rust's own stack guard) which is smaller than stack_guard_gap (16MB) and so it segfaults.

The logic only "didn't fail" before because the stack_guard_gap was much lower than the default stack rlimit. But even here, it would not have been able to perform its intended purpose of being able to detect a stack overflow, since the kernel's stack-guard logic would have caused a segfault before the real stack ever expanded into Rust's own stack guard.

In case my words aren't the best, here is a nice diagram instead:

-16MB-x     -16MB                           -8MB                x           top
|           |                               |                   |           |
--------------------------------------------[A]-----------------[<--   stack]
G                                                               S

[..] are mapped pages. Now, Linux's stack guard will segfault if there is anything between G and S. For Rust, its own stack guard page at A causes this. Previously, G-S was much smaller and A was lower than G.

AIUI, Linux developers are talking at the moment about the best way to "unbreak" programs that do this - they try not to break userspace. But it is nevertheless incorrect behaviour by Rust to do this anyways, and a better way of doing it should be found. Unfortunately I don't have any better ideas at the moment, since the very notion of "stack start address" for main threads is apparently not set in stone by POSIX or other standards, leading to this sort of misunderstanding between kernel vs userspace on where the "start" really is.

I'm not sure about the details of other systems, but if they implement stack guards like how Linux does it (i.e. against the real allocated stack rather than against a stack rlimit), and the relevant numbers match up like they do above, then Rust's main stack guard would also problems there.

CC @arielb1 @bwhacks @cuviper

This causes rust-lang/cargo#4197