cr.yp.to:
2014.02.05: Entropy Attacks! (original) (raw)
The cr.yp.to blog
Table of contents (Access-I for index page)
| 2026.04.05: NSA and IETF, part 7: Counting votes. #pqcrypto #hybrids #nsa #ietf #voting |
|---|
| 2026.02.21: NSA and IETF, part 6: The structure of the debate. #pqcrypto #hybrids #nsa #ietf #chart |
| 2026.02.19: NSA and IETF, part 5: One battle after another. #pqcrypto #hybrids #nsa #ietf #lastcall |
| 2025.11.23: NSA and IETF, part 4: An example of censored dissent. #pqcrypto #hybrids #nsa #ietf #scope |
| 2025.11.23: NSA and IETF, part 3: Dodging the issues at hand. #pqcrypto #hybrids #nsa #ietf #dodging |
| 2025.11.23: NSA and IETF, part 2: Corruption continues. #pqcrypto #hybrids #nsa #ietf #corruption |
| 2025.10.05: MODPOD: The collapse of IETF's protections for dissent. #ietf #objections #censorship #hybrids |
| 2025.10.04: NSA and IETF: Can an attacker simply purchase standardization of weakened cryptography? #pqcrypto #hybrids #nsa #ietf #antitrust |
| 2025.09.30: Surreptitious surveillance: On the importance of not being seen. #marketing #stealth #nsa |
| 2025.04.23: McEliece standardization: Looking at what's happening, and analyzing rationales. #nist #iso #deployment #performance #security |
| 2025.01.18: As expensive as a plane flight: Looking at some claims that quantum computers won't work. #quantum #energy #variables #errors #rsa #secrecy |
| 2024.10.28: The sins of the 90s: Questioning a puzzling claim about mass surveillance. #attackers #governments #corporations #surveillance #cryptowars |
| 2024.08.03: Clang vs. Clang: You're making Clang angry. You wouldn't like Clang when it's angry. #compilers #optimization #bugs #timing #security #codescans |
| 2024.06.12: Bibliography keys: It's as easy as [1], [2], [3]. #bibliographies #citations #bibtex #votemanipulation #paperwriting |
| 2024.01.02: Double encryption: Analyzing the NSA/GCHQ arguments against hybrids. #nsa #quantification #risks #complexity #costs |
| 2023.11.25: Another way to botch the security analysis of Kyber-512: Responding to a recent blog post. #nist #uncertainty #errorbars #quantification |
| 2023.10.23: Reducing "gate" counts for Kyber-512: Two algorithm analyses, from first principles, contradicting NIST's calculation. #xor #popcount #gates #memory #clumping |
| 2023.10.03: The inability to count correctly: Debunking NIST's calculation of the Kyber-512 security level. #nist #addition #multiplication #ntru #kyber #fiasco |
| 2023.06.09: Turbo Boost: How to perpetuate security problems. #overclocking #performancehype #power #timing #hertzbleed #riskmanagement #environment |
| 2022.08.05: NSA, NIST, and post-quantum cryptography: Announcing my second lawsuit against the U.S. government. #nsa #nist #des #dsa #dualec #sigintenablingproject #nistpqc #foia |
| 2022.01.29: Plagiarism as a patent amplifier: Understanding the delayed rollout of post-quantum cryptography. #pqcrypto #patents #ntru #lpr #ding #peikert #newhope |
| 2020.12.06: Optimizing for the wrong metric, part 1: Microsoft Word: Review of "An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development" by Knauff and Nejasmic. #latex #word #efficiency #metrics |
| 2019.10.24: Why EdDSA held up better than ECDSA against Minerva: Cryptosystem designers successfully predicting, and protecting against, implementation failures. #ecdsa #eddsa #hnp #lwe #bleichenbacher #bkw |
| 2019.04.30: An introduction to vectorization: Understanding one of the most important changes in the high-speed-software ecosystem. #vectorization #sse #avx #avx512 #antivectors |
| 2017.11.05: Reconstructing ROCA: A case study of how quickly an attack can be developed from a limited disclosure. #infineon #roca #rsa |
| 2017.10.17: Quantum algorithms to find collisions: Analysis of several algorithms for the collision problem, and for the related multi-target preimage problem. #collision #preimage #pqcrypto |
| 2017.07.23: Fast-key-erasure random-number generators: An effort to clean up several messes simultaneously. #rng #forwardsecrecy #urandom #cascade #hmac #rekeying #proofs |
| 2017.07.19: Benchmarking post-quantum cryptography: News regarding the SUPERCOP benchmarking system, and more recommendations to NIST. #benchmarking #supercop #nist #pqcrypto |
| 2016.10.30: Some challenges in post-quantum standardization: My comments to NIST on the first draft of their call for submissions. #standardization #nist #pqcrypto |
| 2016.06.07: The death of due process: A few notes on technology-fueled normalization of lynch mobs targeting both the accuser and the accused. #ethics #crime #punishment |
| 2016.05.16: Security fraud in Europe's "Quantum Manifesto": How quantum cryptographers are stealing a quarter of a billion Euros from the European Commission. #qkd #quantumcrypto #quantummanifesto |
| 2016.03.15: Thomas Jefferson and Apple versus the FBI: Can the government censor how-to books? What if some of the readers are criminals? What if the books can be understood by a computer? An introduction to freedom of speech for software publishers. #censorship #firstamendment #instructions #software #encryption |
| 2015.11.20: Break a dozen secret keys, get a million more for free: Batch attacks are often much more cost-effective than single-target attacks. #batching #economics #keysizes #aes #ecc #rsa #dh #logjam |
| 2015.03.14: The death of optimizing compilers: Abstract of my tutorial at ETAPS 2015. #etaps #compilers #cpuevolution #hotspots #optimization #domainspecific #returnofthejedi |
| 2015.02.18: Follow-You Printing: How Equitrac's marketing department misrepresents and interferes with your work. #equitrac #followyouprinting #dilbert #officespaceprinter |
| 2014.06.02: The Saber cluster: How we built a cluster capable of computing 3000000000000000000000 multiplications per year for just 50000 EUR. #nvidia #linux #howto |
| 2014.05.17: Some small suggestions for the Intel instruction set: Low-cost changes to CPU architecture would make cryptography much safer and much faster. #constanttimecommitment #vmul53 #vcarry #pipelinedocumentation |
| 2014.04.11: NIST's cryptographic standardization process: The first step towards improvement is to admit previous failures. #standardization #nist #des #dsa #dualec #nsa |
| 2014.03.23: How to design an elliptic-curve signature system: There are many choices of elliptic-curve signature systems. The standard choice, ECDSA, is reasonable if you don't care about simplicity, speed, and security. #signatures #ecc #elgamal #schnorr #ecdsa #eddsa #ed25519 |
| 2014.02.13: A subfield-logarithm attack against ideal lattices: Computational algebraic number theory tackles lattice-based cryptography. |
| 2014.02.05: Entropy Attacks! The conventional wisdom says that hash outputs can't be controlled; the conventional wisdom is simply wrong. |
2014.02.05: Entropy Attacks! The conventional wisdom says that hash outputs can't be controlled; the conventional wisdom is simply wrong.
The conventional wisdom is that hashing more entropy sources can't hurt: if H is any modern cryptographic hash function then H(x,y,z) is at least as good a random number as H(x,y), no matter how awful z is. So we pile one source on top of another, hashing them all together and hoping that at least one of them is good.
But what if z comes from a malicious source that can snoop on x and y? For example, imagine a malicious "secure randomness" USB device that's actually spying on all your other randomness sources through various side channels, or—worse—imagine RDRAND microcode that's looking at the randomness pool that it's about to be hashed into. I should note that none of the attacks described below rely on tampering with x or y, or otherwise modifying data outside the malicious entropy source; you can't stop these attacks by double-checking the integrity of data.
Of course, the malicious device will also be able to see other sensitive information, not just x and y. But this doesn't mean that it's cheap for the attacker to exfiltrate this information! The attacker needs to find a communication channel out of the spying device. Randomness generation influenced by the device is a particularly attractive choice of channel, as I'll explain below.
Here's an interesting example of an attack that can be carried out by this malicious source:
- Generate a random r.
- Try computing H(x,y,r).
- If H(x,y,r) doesn't start with bits 0000, go back to step 1.
- Output r as z.
This attack forces H(x,y,z) to start 0000, even if x and y were perfectly random. It's fast, taking just 16 computations of H on average.
Maybe the randomness generator doesn't actually output H(x,y,z); it uses H(x,y,z) as a seed for some generator G, and outputs G(H(x,y,z)). Okay: the attacker changes H to G(H), and again forces the output G(H(x,y,z)) to start 0000. Similarly, the attack isn't stopped by pre-hashing of the entropy source before it's mixed with other entropy sources. Every mix from the malicious entropy source lets the attacker produce another "random" number that starts 0000.
More generally, instead of producing "random" numbers that start with 0000, 0000, 0000, etc., the malicious entropy source can produce "random" numbers that start with successive 4-bit components of AESk(0),AESk(1),... where k is a secret key known only to the attacker. Nobody other than the attacker will be able to detect this pattern.
Recall that DSA and ECDSA require random "nonces" for signatures. It's easy to imagine someone grabbing each nonce as a new "random" number from the system's randomness generator. However, it's well known (see, e.g., https://www.isg.rhul.ac.uk/~sdg/igor-slides.pdf) that an attacker who can predict the first 4 bits of each nonce can quickly compute the user's secret key after a rather small number of signatures. Evidently hashing an extra entropy source does hurt—in the worst possible way; the attacker has the user's secret key!—contrary to the conventional wisdom stated above.
EdDSA (see https://ed25519.cr.yp.to) is different. It uses randomness once to generate a secret key and is then completely deterministic in its signature generation (following 1997 Barwood, 1997 Wigley, et al.). The malicious entropy source can still control 4 bits of the secret key, speeding up discrete-log attacks by a factor of 4, but this isn't a problem—we use curves with ample security margins. The source can increase the 4 bits by carrying out exponentially more H computations, but this has to fit into the time available after inspecting x and y and before generating a "random" number.
Of course, there are many other uses of randomness in cryptography: for example, if you want forward secrecy then you're constantly generating new ECDH keys. Controlling 4 bits of each secret key isn't nearly as damaging as controlling 4 bits of DSA/ECDSA nonces—it's the same factor of 4 mentioned above—but, as I mentioned above, the malicious entropy source can also use randomness generation as a communication channel to the attacker. For example, the source controls the low bit of each public key with an average cost of just 2 public-key generations, and uses the concatenation of these low bits across public keys to communicate an encryption of the user's long-term key. This channel is undetectable, reasonably high bandwidth, and reasonably low cost.
On the other hand, there's no actual need for this huge pile of random numbers. If you've somehow managed to generate one secure 256-bit key then from that key you can derive all the "random" numbers you'll ever need for every cryptographic protocol—and you can do this derivation in a completely deterministic, auditable, testable way, as illustrated by EdDSA. (If you haven't managed to generate one secure 256-bit key then you have much bigger problems.)
With this as-deterministic-as-possible approach, the entire influence of the malicious entropy source is limited to controlling a few "random" bits somewhere. There are at least two obvious ways to further reduce this control:
- Read less-likely-to-be-malicious entropy sources after completing all reading of the more-likely-to-be-malicious entropy sources. Of course, this doesn't help if the last source turns out to be malicious.
- Increase the amount of processing, memory, etc. involved in H—as in hashcash, proofs of work in general, password hashing, etc. The costs are negligible, since all of this is done only once.
Let me emphasize that what I'm advocating here, for security reasons, is a sharp transition between
- before crypto: the whole system collecting enough entropy;
- after: the system using purely deterministic cryptography, never adding any more entropy.
This is exactly the opposite of what people tend to do today, namely adding new entropy all the time. The reason that new entropy is a problem is that each addition of entropy is a new opportunity for a malicious entropy source to control "random" outputs—breaking DSA, leaking secret keys, etc. The conventional wisdom says that hash outputs can't be controlled; the conventional wisdom is simply wrong.
(There are some special entropy sources for which this argument doesn't apply. For example, an attacker can't exert any serious control over the content of my keystrokes while I'm logged in; I don't see how hashing this particular content into my laptop's entropy pool can allow any attacks. But I also don't see how it helps.)
Is there any serious argument that adding new entropy all the time is a good thing? The Linux /dev/urandom manual page claims that without new entropy the user is "theoretically vulnerable to a cryptographic attack", but (as I've mentioned in various venues) this is a ludicrous argument—how can anyone simultaneously believe that
- we can't figure out how to deterministically expand one 256-bit secret into an endless stream of unpredictable keys (this is what we need from urandom), but
- we can figure out how to use a single key to safely encrypt many messages (this is what we need from SSL, PGP, etc.)?
There are also people asserting that it's important for RNGs to provide "prediction resistance" against attackers who, once upon a time, saw the entire RNG state. But if the attacker sees the RNG state that was used to generate your long-term SSL keys, long-term PGP keys, etc., then what exactly are we gaining by coming up with unpredictable random numbers in the future? I'm reminded of a Mark Twain quote:
Behold, the fool saith, "Put not all thine eggs in the one basket"—which is but a manner of saying, "Scatter your money and your attention;" but the wise man saith, "Put all your eggs in the one basket and—WATCH THAT BASKET."
We obviously need systems that can maintain confidentiality of our long-term keys. If we have such systems, how is the attacker supposed to ever see the RNG state in the first place? Maybe "prediction resistance" can be given a theoretical definition for an isolated RNG system, but I don't see how it makes any sense for a complete cryptographic system.
[Advertisement: If you're interested in these topics, you might want to join therandomness-generationmailing list; I sent this there a few days ago. To subscribe, send email torandomness-generation+subscribe@googlegroups.com. There's also a mailing list cleverly nameddsfjdssdfsdfor discussion of randomness in IETF protocols. Randomness is also a frequent topic on more general cryptographic mailing lists.]
[2022.01.09 update: Updated links above.]
[2023.03.17 update: I should note related work from 2006, namely Section 3.2 of"Analysis of a strong pseudo random number generator"by Thomas Biege. The 2006 paper and this blog post are both analyzing what a malicious entropy source can do. After that setup, the 2006 work is explaining how a linear hash function gives a malicious entropy source full control over outputs, whereas this blog post is explaining how a modern cryptographic hash function gives a malicious entropy source partial control over outputs.]
**Version:**This is version 2023.03.17 of the 20140205-entropy.html web page.