[Python-Dev] BDFL ruling request: should we block forever waiting for high-quality random bits? (original) (raw)

Theodore Ts'o tytso at mit.edu
Fri Jun 10 15:54:11 EDT 2016


I will observe that feelings have gotten a little heated, so without making any suggestions to how the python-dev community should decide things, let me offer some observations that might perhaps shed a little light, and perhaps dispell a little bit of the heat.

As someone who has been working in security for a long time --- before I started getting paid to hack Linux full-time, worked on Kerberos, was on the Security Area Directorate of the IETF, where among other things I was one of the working group chairs for the IP Security (ipsec) working group --- I tend to cringe a bit when people talk about security in terms of absolutes. For example, the phrase "improving Python's security". Security is something that is best talked about given a specific threat environment, where the value of what you are trying to protect, the capabilities and resources of the attackers, etc., are all well known.

This gets hard for those of us who work on infrastructure which can get used in many different arenas, and so that's something that applies both to the Linux Kernel and to C-Python, because how people will use the tools that we spend so much of our passion crafting is largely out of our control, and we may not even know how they are using it.

As far as /dev/urandom is concerned, it's true that it doesn't block before it has been initialized. If you are a security academic who likes to write papers about how great you are at finding defects in other people's work. This is definitely a weakness.

Is it a fatal weakness? Well, first of all, on most server and desktop deployments, we save 1 kilobyte or so of /dev/urandom output during the shutdown sequence, and immediately after the init scripts are completed. This saved entropy is then piped back into /dev/random infrastructure and used initialized /dev/random and /dev/urandom very early in the init scripts. On a freshly instaled machine, this won't help, true, but in practice, on most systems, /dev/urandom will get initialized from interrupt timing sampling within a few seconds after boot. For example, on a sample Google Compute Engine VM which is booted into Debian and then left idle, /dev/urandom was initialized within 2.8 seconds after boot, while the root file system was remounted read-only 1.6 seconds after boot.

So even on Python pre-3.5.0, realistically speaking, the "weakness" of os.random would only be an issue (a) if it is run within the first few seconds of boot, and (b) os.random is used to directly generate a long-term cryptographic secret. If you are fork openssl or ssh-keygen to generate a public/private keypair, then you aren't using os.random.

Furthermore, if you are running on a modern x86 system with RDRAND, you'll also be fine, because we mix in randomness from the CPU chip via the RDRAND instruction.

So this whole question of whether os.random should block is important in certain very specific cases, and if you are generating long-term cryptogaphic secrets in Python, maybe you should be worrying about that. But to be honest, there are lots of other things you should be worrying about as well, and I would hope that people writing cryptographic code would be asking questions of how the random nunmber stack is working, not just at the C-Python interpretor level, but also at the OS level.

My preference would be that os.random should block, because the odds that people would be trying to generate long-term cryptographic secrets within seconds after boot is very small, and if you do block for a second or two, it's not the end of the world. The problem that triggered this was specifically because systemd was trying to use C-Python very early in the boot process to initialize the SIPHASH used for the dictionary, and it's not clear that really needed to be extremely strong because it wasn't a long-term cryptogaphic secret --- certainly not how systemd was using that specific script!

The reason why I think blocking is better is that once you've solved the "don't hang the VM for 90 seconds until python has started up", someone who is using os.random will almost certainly not be on the blocking path of the system boot sequence, and so blocking for 2 seconds before generating a long-term cryptographic secret is not the end of the world.

And if it does block by accident, in a security critical scenario it will hopefully force the progammer to think, and and in a non-security critical scenario, it should be easy to switch to either a totally non-blocking interface, or switch to a pseudo-random interface hwich is more efficient.

HOWEVER, on the flip side, if os.random doesn't block, in 99.999% percent of the cases, the python script that is directly generating a long-term secret will not be started 1.2 seconds after the root file system is remounted read/write, so it is also not the end of the world. Realistically speaking, we do know which processes are likely to be generating long-term cryptographic secrets imnmediately after boot, and they'll most likely be using progams like openssl or openssh-keygen, to actually generate the cryptogaphic key, and in both of those places, (a) it's there problem to get it right, and (b) blocking for two seconds is a completely reasonable thing to do, and they will probably do it, so we're fine.

So either way, I think it will be fine. I may have a preference, but if Python choses another path, all will be well. There is an old saying that Academic politics are often so passionate because the stakes are so small. It may be that one of the reasons why this topic has been so passionate is precisely because of Sayre's Law.

Peace,

                    - Ted


More information about the Python-Dev mailing list