gh-97514: Authenticate the forkserver control socket. by gpshead · Pull Request #99309 · python/cpython (original) (raw)
This adds authentication. In the past only filesystem permissions protected this socket from code injection into the forkserver process by limiting access to the same UID, which didn't exist when Linux abstract namespace sockets were used (see issue) meaning that any process in the same system network namespace could inject code. We've since stopped using abstract namespace sockets by default, but protecting our control sockets regardless of type seems desirable.
This reuses the HMAC based shared key auth already used by multiprocessing.connection
sockets for other purposes.
Doing this is useful so that filesystem permissions are not relied upon and trust isn't implied by default between all processes running as the same UID with access to the unix socket.
Tasks remaining
- clean up the file descriptor leak from the new tests.
- Microbenchmarking
- Q: Decide if this needs an off switch. A: Nope. If we find reason to during betas before 3.14 we can add it. Rationale: This change is in the noise for most uses,
multiprocessing.Pool
worker processes are long lived by default so spawning new ones via the start method is infrequent compared to the number of tasks they are given. Only applications using maxtasksperchild= with a very low value might be able to notice, but even then the amount of work done in a worker process should far exceed any additional overhead this security measure adds to requesting forkserver to spawn new processes.
pyperformance benchmarks
No significant changes. Including concurrent_imap
which exercises multiprocessing.Pool.imap
in that suite.
Microbenchmarks
This does slightly slow down forkserver use. How much so appears to depend on the platform. Modern platforms and simple platforms are less impacted. This PR adds additional IPC round trips to the control socket to tell forkserver to spawn a new process. Systems with potentially high latency IPC are naturally impacted more.
Using my multiprocessing process-creation-benchmark.py:
I switched between this PR branch and main
via a simple git checkout after my build as the changes are pure Python so no rebuild is needed.
On an AMD zen4 system:
889 Procs/sec dropped to 874. 1.5% slower. Insignificant.
AMD 7800X3D single-CCD 8 cores.
% ../b/python process-creation-benchmark.py 5 forkserver
Process Creation Microbenchmark (max 7 active processes) (5 iterations)
multiprocessing start method: forkserver
sys.version='3.14.0a1+ (~main branch~, Nov 10 2024) [GCC 13.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 666.77 0.049 108.39
128 831.09 0.154 44.05
384 887.16 0.433 9.27
1024 886.02 1.156 1.37
2048 888.99 2.304 2.76
% ./b/python ~/Downloads/process-creation-benchmark.py 5 forkserver
Process Creation Microbenchmark (max 7 active processes) (5 iterations)
multiprocessing start method: forkserver
sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey-dirty:07c01d459f8, Nov 10 2024) [GCC 13.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 640.53 0.052 130.27
128 809.62 0.158 38.79
384 867.22 0.443 7.66
1024 873.75 1.172 2.76
2048 873.57 2.344 2.85
Expand for baseline fork (2659 Procs/sec) and spawn (268) measurements.``` % ../b/python ~/Downloads/process-creation-benchmark.py 13 fork Process Creation Microbenchmark (max 7 active processes) (13 iterations) multiprocessing start method: fork sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey-dirty:07c01d459f8, Nov 10 2024) [GCC 13.2.0]' -------------------------------------------------------------------------------- Total Procs/sec Time (s) StdDev -------------------------------------------------------------------------------- 32 2,300.78 0.014 78.91 128 2,391.11 0.054 114.68 384 2,650.31 0.145 13.23 1024 2,646.28 0.387 16.47 2048 2,641.08 0.775 13.65 5120 2,659.42 1.925 11.82 % ../b/python ~/Downloads/process-creation-benchmark.py 13 spawn Process Creation Microbenchmark (max 7 active processes) (13 iterations) multiprocessing start method: spawn sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey-dirty:07c01d459f8, Nov 10 2024) [GCC 13.2.0]' -------------------------------------------------------------------------------- Total Procs/sec Time (s) StdDev -------------------------------------------------------------------------------- 32 235.96 0.136 13.91 128 259.53 0.493 0.79 384 267.62 1.435 1.00 1024 267.89 3.822 0.35 ```
On an Intel Broadwell Xeon E5-2698 v4 system:
828 Procs/sec dropped to 717. ~15% slower. Significant. BUT... if I drop the active processes from 19 to 9. The difference was far less. 414 dropped to 398 for a ~4% slower. Moderate.
20 cores, 2 ring busses, 4 memory controllers, single socket. A large die Broadwell Xeon is complicated. At high parallelism counts, interprocess communication latencies add up. I predict similar results from multi-core-complex-die zen/epycs and multi socket systems, probably also on big.little mixed power/perf core arrangements.
% ../b/python ~/process-creation-benchmark.py 13 forkserver
Process Creation Microbenchmark (max 19 active processes) (13 iterations)
multiprocessing start method: forkserver
sys.version='3.14.0a1+ (~main branch~, Nov 10 2024) [GCC 13.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 535.14 0.062 77.23
128 735.49 0.174 6.53
384 798.69 0.481 4.43
1024 820.84 1.248 1.90
2048 827.63 2.475 4.31
% ../b/python ~/process-creation-benchmark.py 13 forkserver
Process Creation Microbenchmark (max 19 active processes) (13 iterations)
multiprocessing start method: forkserver
sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey-dirty:07c01d459f8, Nov 10 2024) [GCC 13.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 449.24 0.073 63.19
128 614.39 0.208 16.66
384 668.49 0.575 11.36
1024 716.77 1.430 18.10
2048 716.73 2.858 13.12
Expand for baseline fork (1265 Procs/sec) and spawn (233) measurements.
% ../b/python ~/process-creation-benchmark.py 13 fork
Process Creation Microbenchmark (max 19 active processes) (13 iterations)
multiprocessing start method: fork
sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey-dirty:07c01d459f8, Nov 10 2024) [GCC 13.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 1,241.39 0.026 51.43
128 1,259.44 0.102 5.01
384 1,254.59 0.306 3.86
1024 1,258.45 0.814 6.77
2048 1,265.48 1.618 8.34
% ./b/python ~/process-creation-benchmark.py 13 spawn
Process Creation Microbenchmark (max 19 active processes) (13 iterations)
multiprocessing start method: spawn
sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey-dirty:07c01d459f8, Nov 10 2024) [GCC 13.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 188.08 0.170 2.58
128 221.20 0.579 0.75
384 227.56 1.687 0.85
1024 233.34 4.388 0.54
On an Raspberry Pi 5
126 Proc/sec dropped to 121. A ~4% slowdown. Moderate.
Raspberry Pi 5 running 32-bit raspbian.
% ./python ../process-creation-benchmark.py
Process Creation Microbenchmark (max 3 active processes) (5 iterations)
multiprocessing start method: forkserver
sys.version='3.14.0a1+ (~main branch~, Nov 10 2024, 19:06:56) [GCC 12.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 121.23 0.266 9.82
128 125.45 1.020 0.84
384 125.71 3.055 0.27
% ./python ../process-creation-benchmark.py
Process Creation Microbenchmark (max 3 active processes) (5 iterations)
multiprocessing start method: forkserver
sys.version='3.14.0a1+ (heads/security/multiprocessing-forkserver-authkey:07c01d4, Nov 10 2024, 19:06:56) [GCC 12.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 114.57 0.281 10.29
128 119.70 1.069 0.28
384 120.84 3.178 0.41
Expand for baseline fork (973 Procs/sec) and spawn (32) measurements.
% /python ../process-creation-benchmark.py 5 fork
Process Creation Microbenchmark (max 3 active processes) (5 iterations)
multiprocessing start method: fork
sys.version='3.14.0a1+ (~main branch~, Nov 10 2024, 19:06:56) [GCC 12.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 933.01 0.034 44.03
128 973.00 0.132 1.33
384 968.48 0.396 1.55
1024 972.78 1.053 0.77
% ./python ../process-creation-benchmark.py 5 spawn
Process Creation Microbenchmark (max 3 active processes) (5 iterations)
multiprocessing start method: spawn
sys.version='3.14.0a1+ (~main branch~, Nov 10 2024, 19:06:56) [GCC 12.2.0]'
--------------------------------------------------------------------------------
Total Procs/sec Time (s) StdDev
--------------------------------------------------------------------------------
32 31.97 1.001 0.12
128 32.46 3.943 0.02