Issue 3026: integer overflow in hashlib causes wrong results for cryptographic hash functions [was: mmap broken with large files on 64bit system] (original) (raw)

Created on 2008-06-02 02:24 by donut, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
testbigfile.py donut,2008-06-02 02:24 test script
large_digest_update.diff schmir,2008-07-14 22:16 patch against svn r64953
Messages (8)
msg67623 - (view) Author: Matthew Mueller (donut) Date: 2008-06-02 02:24
mmap on large files on 64 bit platforms in python >=2.5 returns some sort of garbage. In 2.4 it would just throw an exception. Now I get something like this (script runs md5.md5 on mmap object, and then runs os.system md5sum for comparison): This is python2.5 from Ubuntu 8.04 AMD64 /tmp$ python2.5 testbigfile.py python mmap md5: 1230552d39b7c1751f86bae5205ec0c8 abe59e28c9a3f11b883f62c80a3833a5 *bigfile This is python svn as of 20080601, compiled the on same system. /tmp$ python2.6 testbigfile.py testbigfile.py:5: DeprecationWarning: the md5 module is deprecated; use hashlib instead import md5 python mmap md5: 1230552d39b7c1751f86bae5205ec0c8 abe59e28c9a3f11b883f62c80a3833a5 *bigfile Also note how the python md5 call returns immediately, not something you would expect when md5ing 4GB of data.
msg67624 - (view) Author: Matthew Mueller (donut) Date: 2008-06-02 02:29
Actually, I just realized that this might be a problem with md5 module instead. Either way, something is busted.
msg67701 - (view) Author: Ralf Schmitt (schmir) Date: 2008-06-04 21:16
I tested this with python 2.6 and can confirm the issue. The problem is that unsigned int isn't big enough to hold the size of the objects, but the size is downcasted to an unsigned int at several places in _hashopenssl.c. All of these occurences of Py_SAFE_DOWNCAST seem problematic to me (Py_SAFE_DOWNCAST(len, Py_ssize_t, unsigned int))
msg67775 - (view) Author: Ralf Schmitt (schmir) Date: 2008-06-06 15:13
the same bug also occurs when computing the md5 of a string larger than 2**32
msg69642 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-07-14 05:18
So would anybody like to contribute a patch?
msg69664 - (view) Author: Ralf Schmitt (schmir) Date: 2008-07-14 22:16
this patch adds a digest_update function. digest_update calls EVP_DigestUpdate(..) with chunks of 16 MB size and also checks for signals. I didn't write any tests (as they will most probably annoy many people cause they would need much memory). testbigfile.py however now works.
msg73373 - (view) Author: Ralf Schmitt (schmir) Date: 2008-09-18 11:52
same issue in http://bugs.python.org/issue3886. it's sad that no one took a look at the patch... now, it should probably be closed...
msg73375 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2008-09-18 12:06
Ok, closing. Thanks for the patch, anyway.
History
Date User Action Args
2022-04-11 14:56:35 admin set github: 47276
2008-09-18 12:06:26 loewis set status: open -> closedresolution: out of datemessages: +
2008-09-18 11:52:53 schmir set messages: +
2008-08-05 22:28:45 schmir set title: mmap broken with large files on 64bit system -> integer overflow in hashlib causes wrong results for cryptographic hash functions [was: mmap broken with large files on 64bit system]
2008-07-14 22:16:27 schmir set files: + large_digest_update.diffkeywords: + patchmessages: +
2008-07-14 05🔞31 loewis set nosy: + loewismessages: +
2008-06-12 06:00:56 georg.brandl set priority: criticalversions: + Python 3.0
2008-06-06 15:13:27 schmir set messages: +
2008-06-04 21:16:50 schmir set messages: +
2008-06-04 20:43:00 schmir set nosy: + schmir
2008-06-02 02:29:53 donut set messages: +
2008-06-02 02:24:57 donut create