msg56130 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2007-09-25 11:31 |
The functions zlib.crc32() and zlib.adler32() return a signed value in the range(-2147483648, 2147483648) on 32-bit platforms, but an unsigned value in the range(0, 4294967296) on 64-bit platforms. This means that half the possible answers are numerically different on these two kinds of platforms. Ideally, this should be fixed by having them always return unsigned numbers (their C return type is unsigned too). It's unclear if we can do this without breaking existing code, though :-( |
|
|
msg56146 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-09-26 16:51 |
Since it's basically a magic cookie, not a meaningful numeric value, I'd propose sticking with backwards compatibility and fixing the 64-bit version to return a signed version. return x - ((x & 0x80000000) <<1) anyone? |
|
|
msg58336 - (view) |
Author: Tim Lesher (tlesher) * |
Date: 2007-12-10 01:45 |
Both CRC-32 and ADLER32 are standards (described in ISO 3309 and RFC 1950 respectively); whatever fix implemented should make sure that the output complies. ISO 3309 isn't available online as far as I can see, but CRC-32 reference code is published in RFC 1952; RFC 1950 contains reference code for ADLER32. |
|
|
msg58344 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2007-12-10 12:29 |
The C reference code in rfc1950 for Adler-32 and in rfc1952 for CRC-32 compute with and return "unsigned long" values. From this point of view, returning negative values on 32-bit machines from CPython's zlib module can be considered a bug. That only leaves open the question of backward compatibility. |
|
|
msg58362 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-12-10 19:40 |
How about, in Python 2.6 we make the 64-bit version return a signed value for better compatibility with the 32-bit version, and in Python 3.0 we make both versions return the signed value for better compliance with the standard? |
|
|
msg58375 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2007-12-10 22:23 |
Obscure but reasonable. (I suspect you meant to say that py3k should return the *unsigned* value for better compliance with the standard.) |
|
|
msg58376 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2007-12-10 22:25 |
> Obscure but reasonable. (I suspect you meant to say that py3k should > return the *unsigned* value for better compliance with the standard.) Yes. :) |
|
|
msg63664 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-03-17 16:27 |
working on this now. foo = 'abcdefghijklmnop' 2.x 32-bit long: zlib.crc32(foo) returns -1808088941 2.x 64-bit long: zlib.crc32(foo) returns 2486878355 This is because PyInt_FromLong() happily fits the value into a signed long internally to the integer object on 64-bit platforms. They are both the same number if considered with & 0xffffffff. I'm doing as guido suggests and leaving this slightly odd behavior for 2.x so that crc32 and adler32 always return an integer object. in 3.0 they'll always return an unsigned value. |
|
|
msg63666 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-03-17 16:33 |
question: should I also make 64-bit 2.x return a signed value as well to be consistent with 32bit python 2.x? Consistency in case someone ever pickles the value and sends it to another python instance of a different word length would be good... |
|
|
msg63668 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2008-03-17 16:35 |
Sure. (Though sending pickles to 3.0 would still cause problems. Consumers of pickled checksums would do wise to *always* take the CRC mod 2**32 before doing comparisons.) |
|
|
msg63676 - (view) |
Author: Jesús Cea Avión (jcea) *  |
Date: 2008-03-17 17:05 |
I store CRC in reed-solomon schema of mine. I compare with equality, so, I think we should enforce "CRC(python 32 bits) == CRC(python 64 bits)". I will need to touch my code in python 3.0, but that will be inevitable anyway... |
|
|
msg63756 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-03-17 20:25 |
fixed. 3.0 always returns unsigned. 2.6 always returns signed, 2**31...2**31-1 come back as negative integers. trunk r61449 branches/py3k r61459 |
|
|
msg64416 - (view) |
Author: Michael Becker (mbecker) |
Date: 2008-03-24 14:58 |
In case it isn't obvious the work around for pre 3.0 to get the right sum is something like: x=zlib.adler32(str) if x < 0: x=(long(x) + 4294967296L) # 2^32, long may or may not be needed here |
|
|
msg64420 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-03-24 16:55 |
The workaround I prefer to use is: x = zlib.adler32(mystr) & 0xffffffffL |
|
|
msg74380 - (view) |
Author: Facundo Batista (facundobatista) *  |
Date: 2008-10-06 16:10 |
Let me reopen this, I think we have an issue with this fix. The conclusion of this discussion so far is that in 3.0 the crc32 will behave like the standard, which is a good thing (tm), but in 2.6 it will not: it should return a signed integer. I agree with this outcome! The documentation for 2.6, the commit message for the fix and what it's said here states that: "2.6 always returns signed, 2**31...2**31-1 come back as negative integers." This is *not* actually happening: >>> s = "*"*100000 >>> print zlib.crc32(s) # 2.6, 32 bits -872059092 >>> print zlib.crc32(s) # 2.6, 64 bits 3422908204 The problem in the code is, IMHO, that the "32b rounding" is being forced by assigning the result to an int (Modules/zlibmodule.c, line 929), but this "rounding" does not actually work for 64b (because the int has 64 bit, and even as it's signed, the number stays big and positive). Thank you! |
|
|
msg74384 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2008-10-06 17:53 |
An int is 32-bits on all popular platforms. Anyways i'll double check things. What platforms did you run your test on? |
|
|
msg79408 - (view) |
Author: J. David Ibáñez (jdavid.ibp@gmail.com) |
Date: 2009-01-08 12:13 |
I believe I have hit this bug. With Python 2.6.1 in a Gentoo Linux 64 bits. This code: from zipfile import ZipFile inzip = ZipFile('Document.odt') outzip = ZipFile('out.zip', 'w') for f in inzip.infolist(): if f.filename != 'content.xml': print f.filename, '(CRC: %s)' % f.CRC outzip.writestr(f, inzip.read(f.filename)) outzip.close() Produces this output: ... styles.xml (CRC: 3930581528) test_odt.py:10: DeprecationWarning: 'l' format requires -2147483648 <= number <= 2147483647 outzip.writestr(f, inzip.read(f.filename)) ... The CRC is not a 32bits signed, and then the module struct complains, here: zipfile.py:1098 self.fp.write(struct.pack("<lLL", zinfo.CRC, zinfo.compress_size, Apparently 'struct.pack' expects 'zinfo.CRC' to be a 32 signed it, which is not. I can attach the 'Document.odt' file if you want. |
|
|
msg79527 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2009-01-10 03:09 |
seems there are bugs with it not staying signed as it should on some 64bit platforms. i'll be looking into this shortly. its a good candidate bug for 2.6.x and 3.0.x releases. |
|
|
msg79528 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2009-01-10 03:10 |
err not 3.0.x, 3.0 is always unsigned like anyone sane would want. :) |
|
|
msg85557 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2009-04-05 19:13 |
J. David Ibáñez - do you still happen to have that Document.odt laying around? |
|
|
msg85626 - (view) |
Author: J. David Ibáñez (jdavid.ibp@gmail.com) |
Date: 2009-04-06 09:42 |
There it is. |
|
|
msg86192 - (view) |
Author: Gaëtan de Menten (gdementen) |
Date: 2009-04-20 09:13 |
Regarding the issue J. David Ibáñez has, I have a few comments to add: - It's also present on 32bit. - AFAICT: * it's present in both 2.6 branch & trunk (as of 68886), * it's a problem with line 1110 (in 2.6 branch), or line 1122 in trunk, which should read "<LLL" instead of "<lLL", * it is an omission from changeset 61591: http://svn.python.org/view/python/trunk/Lib/zipfile.py?view=diff&r1=61590&r2=61591 which changed line 964 (at the time) http://svn.python.org/view/python/trunk/Lib/zipfile.py?annotate=61591#l964 but not the corresponding one in writestr. I'm not sure whether or not a separate bug report should be opened for this issue. |
|
|
msg89723 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2009-06-26 08:19 |
fix for J. David's issue submitted to trunk r73565 and py3k r73566 just in time for the 3.1 release. release30-maint r73567 release26-maint r73568 |
|
|