[Python-Dev] Decimal <-> float comparisons in py3k. (original) (raw)

Case Vanhorsen casevh at gmail.com
Fri Mar 19 14:17:53 CET 2010


On Fri, Mar 19, 2010 at 3:07 AM, Mark Dickinson <dickinsm at gmail.com> wrote:

On Fri, Mar 19, 2010 at 9:37 AM, Mark Dickinson <dickinsm at gmail.com> wrote:

Making hashes of int, float, Decimal and Fraction all compatible with one another, efficient for ints and floats, and not grossly inefficient for Fractions and Decimals, is kinda hairy, though I have a couple of ideas of how it could be done. To elaborate, here's a cute scheme for the above, which is actually remarkably close to what Python already does for ints and floats, and which doesn't require any of the numeric types to figure out whether it's exactly equal to an instance of some other numeric type. After throwing out infinities and nans (which can easily be dealt with separately), everything we care about is a rational number, so it's enough to come up with some mapping from the set of all rationals to the set of possible hashes, subject to the requirement that the mapping can be computed efficiently for the types we care about. For any prime P there's a natural 'reduce modulo P' map  reduce : {rational numbers} --> {0, 1, ..., P-1, infinity} defined in pseudocode by:  reduce(m/n) = ((m % P) * inv(n, P)) % P  if n % P != 0  else infinity where inv(n, P) represents the modular inverse to n modulo P. Now let P be the handy Mersenne prime P = 2**31-1 (on a 64-bit machine, the almost equally handy prime 2**61-1 could be used instead), and define a hash function from the set of rationals to the set [-231, 231) by: hash(0) = 0 hash(m/n) = 1 + reduce(m/n - 1) if m/n > 0   # i.e., reduce m/n modulo P, but to [1..P] rather than [0..P-1]. hash(m/n) = -hash(-m/n) if m/n < 0. and in the last two cases, map a result of infinity to the unused hash value -2**31. For ints, this hash function is almost identical to what Python already has, except that the current int hash does a reduction modulo 232-1 or 264-1 rather than 2**31-1.  For all small ints, hash(n) == n, as currently.  Either way, the hash can be computed digit-by-digit in exactly the same manner.  For floats, it's also easy to compute:  express the float as m * 2**e for some integers m and e, compute hash(m), and rotate e bits in the appropriate direction.  And it's straightforward to implement for the Decimal and Fraction types, too. Will this change the result of hashing a long? I know that both gmpy and SAGE use their own hash implementations for performance reasons. I understand that CPython's hash function is an implementation detail, but there are external modules that rely on the existing hash behavior.

FWIW, I'd prefer 2.7 and 3.2 have the same behavior. I don't mind the existing 3.1 behavior and I'd rather not have a difference between 3.1 and 3.2.

casevh

(One minor detail:  as usual, some postprocessing would be required to replace a hash of -1 with something else, since a hash value of -1 is invalid.) Mark


Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/casevh%40gmail.com



More information about the Python-Dev mailing list