msg47773 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-02-13 01:33 |
The md5 and sha (sha1) modules should use OpenSSL for the algorithms when it is available as its implementations are much faster than pythons own. Attached is an initial patch to use OpenSSL for the sha module. Its not ready for committing as is yet, but it is setup to be a generic base for all OpenSSL hashes with a little bit of work in the future. Tossing this out there for people to see how trivial it is and enjoy the speedups. diff is against HEAD but it should apply to 2.4 just fine. |
|
|
msg47774 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-02-17 06:46 |
Logged In: YES user_id=413 hashes-openssl-002.patch replaces the sha and md5 modules with a general hashes module that wraps all hashes that OpenSSL supports. note that OpenSSLs implementations are much faster than the previous python versions as it choses versions optimized for your particular hardware. Incase python is compiled without openssl the hashes wrapper falls back on the old python sha and md5 module implementations. side note: This may be sufficient for the Debian folks to work around their random odd licensing issue. just have debian python depend on openssl; use this and remove the old md5 module/code that wouldn't get used anyways. |
|
|
msg47775 - (view) |
Author: Jim Jewett (jimjjewett) |
Date: 2005-02-18 19:21 |
Logged In: YES user_id=764593 Should the private modules (such as _sha) be placed in a crypto package, instead of directly in the parent/everything library? |
|
|
msg47776 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-02-28 18:11 |
Logged In: YES user_id=413 a much updated patch (hashlib-patch-004.patch). it incorporates some suggestions as well as including sf patch 935454's sha256/224 and sha512/384 implementations. still not complete but shows the direction its going in (i see a segfault part way thru the test suite after running the sha512 tests). as for the private modules being under another package, i see no reason to do that since there aren't very many (how does that work for binary modules anyways?). |
|
|
msg47777 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-03-01 09:14 |
Logged In: YES user_id=413 hashlib-005.patch now passes its test suite and no problems appear in valgrind. |
|
|
msg47778 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-03-03 21:15 |
Logged In: YES user_id=413 hashlib-006.patch adds fast constructors and a speed test. documentation is the next step. |
|
|
msg47779 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-03-10 08:09 |
Logged In: YES user_id=413 The 007 patch improves the speed of the constructor. There is still a potential speed issue with the constructor/destructor to work on: greg@spiff src $ ./python Lib/test/test_hashlib_speed.py _sha testing speed of old _sha legacy interface 0.06 seconds [20000 creations] 0.24 seconds [20000 "" digests] 0.15 seconds 20 x 106201 bytes [huge data] 0.15 seconds 200 x 10620 bytes [large data] 0.17 seconds 2000 x 1062 bytes [medium data] 0.35 seconds 20020 x 106 bytes [small data] 1.37 seconds 106200 x 20 bytes [digest_size data] 2.75 seconds 212400 x 10 bytes [tiny data] greg@spiff src $ ./python Lib/test/test_hashlib_speed.py sha1 testing speed of hashlib.sha1 <built-in function openssl_sha1> 0.22 seconds [20000 creations] 0.57 seconds [20000 "" digests] 0.09 seconds 20 x 106201 bytes [huge data] 0.09 seconds 200 x 10620 bytes [large data] 0.15 seconds 2000 x 1062 bytes [medium data] 0.71 seconds 20020 x 106 bytes [small data] 3.39 seconds 106200 x 20 bytes [digest_size data] 6.70 seconds 212400 x 10 bytes [tiny data] I suspect the cause is either or both of the shared openssl library call overhead or the openssl EVP abstraction interface. The speed results are very similar to the above regardless of which digest is used (the above was a celeron 333mhz running linux). |
|
|
msg47780 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-03-13 01:13 |
Logged In: YES user_id=413 I linked a _hashlib.so library statically against openssl and reran the speed test. no change. that means its not shared library overhead causing the higher startup time but just an artifact of the OpenSSL EVP interface. Next up, analyze what size things common heavy sha1 using applications regularly hash (BitTorrent and such). |
|
|
msg47781 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-06-12 03:21 |
Logged In: YES user_id=413 Ok, this patch is ready. documentation has been added. I'll bring it up on python-dev for discussion/approval with a link to the htmlified documentation. The speedups are great for any application hashing a lot of data when OpenSSL is used. It also adds a sha224, sha256, sha384 and sha512 support. |
|
|
msg47782 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2005-06-12 12:18 |
Logged In: YES user_id=4771 On a side note, maybe it makes sense for a new module like this one to promote and use the modern (>=2.2) ways of defining C types. What I have in mind is using tp_methods instead of Py_FindMethod, and generally not reverting to strcmp(). In this case, the constants like 'digest_size' would be best stored as class attributes instead, if possible. Indeed, allowing expressions like "hashlib.md5.digest_size" conveys the idea that the result doesn't depend on a particular instance, unlike "hashlib.md5().digest_size". (Of course class attributes are also readable from the instance, as usual.) I can give it a try if you don't want to invest more time in this patch than you already did (for which we are grateful to you :-) |
|
|
msg47783 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2005-06-12 20:35 |
Logged In: YES user_id=593130 Re Doc page: As a somewhat naive (relative to the subject) reader, the title and first sentence implied that 'secure hash' and 'message digest' are two separate things, whereas, judging from the .digest() blurb, they both seem to be16-byte hashes. So I would prefer this equivalence and the actual meaning were made clear at the top. Something like "This module implements a common interface to several secure hash or message digest algorithms that produce 16-byte hashes." If, as I presume, xx.hexdigest() == binascii.hexlify(xx.digest()), then I would say so and reference binsacii for the interconversion functions one would need if one had the two versions to compare or needed to convert after the extraction. |
|
|
msg47784 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-08-01 01:29 |
Logged In: YES user_id=413 per arigo's suggestion I have a version of _hashopenssl.c in my sandbox modified to use the more modern C type API. The constructor is slightly faster (~1-2%) and does seem like a better way to do things. i'll post it after polishing it up. |
|
|
msg47785 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-08-15 03:28 |
Logged In: YES user_id=413 tjreedy and arigo's comments have been taken into consideration. An updated patch (009) has been attached. it uses the python >= 2.2 interface for defining methods and member variables rather than the getattr function with manual strcmp's. I was unable to make digest_size and such class attributes because the hashes are not classes. The hashlib.md5 function for instance is a constructor function that returns an appropriate internal HASH object. The goal of those constructors is to be as fast as possible; wrapping them with python in order to make them actual classes would be too slow and I did not see an obvious way to do it from C. I believe this patch is ready to commit. Further improvements or refinements can be made to it in CVS. the documentation in html for easy viewing has been updated at http://electricrain.com/greg/hashlib-py25-doc/ |
|
|
msg47786 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2005-08-15 08:47 |
Logged In: YES user_id=4771 I see that it would indeed be messy to have 'md5' be a type and 'digest_size' a class attribute given that 'md5' can come from various places depending on what is installed; moreover in the hashopenssl.c file unless I'm mistaken all hashes use the same Python type. Fine by me. |
|
|
msg47787 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2005-08-21 18:50 |
Logged In: YES user_id=413 hashlib has been committed to HEAD for inclusion in python 2.5. I've attached a hashlib-010.patch that is the exact cvs diff of what i committed after further testing. |
|
|