msg226419 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2014-09-05 11:55 |
This is regarding the Python 3 documentation for binascii.crc32(), <https://docs.python.org/dev/library/binascii.html#binascii.crc32>. It repeatedly recommends correcting the sign by doing crc32() & 0xFFFFFFFF, but it is not immediately clear why. Only after reading the Python 2 documentation does one realise that the value is always unsigned for Python 3, so you only really need the workaround if you want to support earlier Pythons. I also suggest documenting the initial CRC input value, which is zero. Suggested wording: binascii.crc32(data[, crc]) Compute CRC-32, the 32-bit checksum of “data”, starting with the given “crc”. The default initial value is zero. The algorithm is consistent with the ZIP file checksum. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Use as follows: print(binascii.crc32(b"hello world")) # Or, in two pieces: crc = binascii.crc32(b"hello", 0) crc = binascii.crc32(b" world", crc) print('crc32 = {:#010x}'.format(crc)) I would simply drop the notice box with the workaround, because I gather that the Python 3 documentation generally omits Python 2 details. (There are no “new in version 2.4 tags” for instance.) Otherwise, clarify if “packed binary format” is a reference to the “struct” module, or something else. Similar fixes are probably appropriate for zlib.crc32() and zlib.alder32(). Also, what is the relationship between binascii.crc32() and zlib.crc32()? I vaguely remember reading that “zlib” is not always available, so I tend to use “binascii” instead. Is there any advantage in using the “zlib” version? The “hashlib” documentation points to “zlib” without mentioning “binascii” at all. |
|
|
msg232927 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2014-12-19 05:14 |
Here is a patch that fixes the binascii, zlib.crc32() and adler32() documentation as I suggested. I’m still interested why there are two ways to do a CRC-32, each equally non-obvious as the other. |
|
|
msg236881 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-02-28 12:06 |
crc & 0xffffffff is still used in gzip, zipfile and tarfile. And some comments say about signess of 32-bit checksums. |
|
|
msg237589 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-03-09 02:28 |
Posting a new patch that also removes the masking from the gzip, zipfile, tarfile, and test_zlib modules. I removed the comment about signedness in tarfile; let me know if you saw any others. |
|
|
msg238715 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-03-20 19:35 |
These notes was added by Gregory in r68535. Ask him if they still are needed. |
|
|
msg238716 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-03-20 19:36 |
See also . |
|
|
msg238723 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2015-03-20 21:07 |
I do not object to the removal of the & 0xfffffff from the stdlib library code if these functions have actually been fixed to always return unsigned now. (double check the behavior, and if good, just do it!) But I think the docs should still mention that the & 0xffffffff is a good practice if code needs to be compatible with Python versions prior to X.Y (list the last release before the behavior was corrected). Possibly within a .. versionchanged: section. People often use the latest docs when writing code in any version of Python as we continually improve docs and are pretty good about noting old behaviors and when behaviors changed. |
|
|
msg238735 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-03-20 22:52 |
Hi Gregory, I think the three functions have been fixed since Python 3.0. It looks like you changed them in revisions 6a7fa8421902 (zlib module) and 132cf3a79126 (crc32 functions). Looking at the 3.0 branch, all three functions use PyLong_FromUnsignedLong(), but in the 2.7 branch, they use PyInt_FromLong(). This is tested in test_zlib.ChecksumTestCase.test_crc32_adler32_unsigned(). See also my changes to test_penguins() in the patch here for further testing. There is no explicit testing of binascii.crc32() unsignedness, but it is implicitly tested by test_same_as_binascii_crc32(), because the expected CRC is beyond the positive 32 bit signed limit. I am happy to put back the suggestion of masking for backwards compatibility if you really think it is necessary. But since it only involves Python < 3.0 compatibility, I thought it would be okay to remove it; see my original post. The documentation is often pretty good about noting when Python 3 behaviours changed, but you usually have to look elsewhere to see the differences from Python 2. |
|
|
msg239599 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-03-30 12:19 |
Patch v3: * Reverted to original crc32(b"hello") example call with the implicit initial CRC * Added “Changed in version 3.0” notices, restoring a brief version of the suggestion to use the bit mask, along with an explanation. Python 2 compatibility information is generally unprecedented in the Python 3 documentation though, but hopefully this version should make more sense to people not already familiar with the Python 2 odd behaviour. |
|
|
msg239602 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-03-30 12:40 |
V4 fixes a merge conflict with recent gzip changes. |
|
|
msg256107 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-12-08 09:45 |
The patch LGTM. But while we are here, wouldn't be worth to unify the name of second parameter of binascii and zlib CRC calculation functions? And unify the description of crc_hqx() with other functions. |
|
|
msg256138 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-12-09 04:47 |
In crc-sign.v5.patch I have changed the binascii.crc_hqx() and crc32() documentation to use “value”, matching the zlib documentation. Is that what you had in mind? |
|
|
msg256140 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-12-09 06:56 |
Yes, thank you Martin. |
|
|
msg256198 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2015-12-11 06:04 |
New changeset 03c4ca1a34e2 by Martin Panter in branch '3.5': Issue #22341: Drop Python 2 workaround and document CRC initial value https://hg.python.org/cpython/rev/03c4ca1a34e2 New changeset 360c1326d8d1 by Martin Panter in branch 'default': Issue #22341: Merge CRC doc from 3.5 https://hg.python.org/cpython/rev/360c1326d8d1 |
|
|
msg256200 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2015-12-11 06:07 |
Thanks for the reviews |
|
|