Unix crypt with SHA-256/512 (original) (raw)
For some time now the Unix crypt() interfaces support more than one way to handle the verification. There is the venerable single DES and the MD5 method which are both widely available. Some other extensions exist as well but they are not universally available.
The Problem
The security departments of some customers look at recommendations they get and evaluate the deployed systems based on this. In a few places this led to the problem that the NIST warns people indiscriminately about the use of MD5 and of course DES. Even though the DES use is weak there has been shown no evidence that the specific of MD5 in crypt() is insecure. This doesn't change the fact that the recommendation is made and some people want to be safe rather than sorry.
There is one weakness in the MD5 implementation as it exists today but it is due to the fact that the time it takes to finish the computation is too short on modern processors.
The result in any case is that customers are calling for a solution.
Existing Practice
There have been solutions to the problem for some time. Specifically the Blowfish encryption-based solution is implemented on a variety of systems, including some Linux distributions. The design of this method is sound: an encryption algorithm which has survived peer review for some time and which is not fast is used together with the obligatory salt. The result is a safe password encoding.
Incidentally, Thomas Ptacek from Matasano just this week published a praise of bcrypt(). His conclusions are not entirely correct since he generalizes. It is certainly possible to design an MD5-based method which addresses the issues of the current design. But if taken as only an evaluation of existing technology it is correct.
So, Just Pick bcrypt Then
Well, there is a problem. I can already hear everybody complaining that I suffer from the NIH syndrome but this is not the reason. The same people who object to MD5 make their decisions on what to use also based on NIST guidelines. And Blowfish is not on the lists of the NIST. Therefore bcrypt() does not solve the problem.
What is on the list is AES and the various SHA hash functions. Both are viable options. The AES variant can be based upon bcrypt(), the SHA variant could be based on the MD5 variant currently implemented.
Since I had to solve the problem and I consider both solutions equally secure I went with the one which involves less code. The solution we use is based on SHA. More precisely, on SHA-256 and SHA-512.
The Solution
The algorithm used is based on the algorithm used for MD5. I implemented a few differences, though:
- In a few steps where the MD5 algorithm adds constant data (i.e., data identical for each invocation) this is avoided.
- The MD5 in repeatedly adds the first letter of the password. I changed that step significantly.
- Inspired by Sun's crypt() implementation I added the possibility to specify the number of rounds the main loop in the algorithm performs in the salt string using rounds=NNN$.
In addition, during the review process in the little working group (see the spec document for names) a few points came up which caused additional changes:
- The maximum salt string length is raised to 16. Not really because 8 bytes (usually about 40 bits of entropy) is not enough. More because varying sizes of the salt string (anything between 8 and 16 should be fine) adds additional entropy when the attacker does not know the salt string.
- Inside the main loop the MD5 algorithm used the salt and password strings. A possible attack could reverse the procedure of creating the hash and start from the back. If one round of the loop could be reversed the password is already available. Additionally, the salt and password strings are ASCII strings which further limits the search space. The SHA variant uses hashes of the strings cut down to the same length as the strings themselves. This addresses the weaknesses mentioned above.
The Result
The resulting specification and a possible implementation is available in the specification document. The contained implementation is released by me into the public domain so that the chance of diverging implementations is reduced. It includes a test suite and has been tested on a number of architectures and platforms. Equivalent code has been included into glibc 2.7.
The code has been reviewed from security teams at HP, IBM, Red Hat, and Sun. See the names in the document. But I wrote the actual document and the code which means I'm more than likely the responsible for remaining problems.
If somebody sees something wrong with this whole approach let me know.