Issue 1334662: int(string, base) wrong answers (original) (raw)

This affects all Python releases to date. See

<http://mail.python.org/pipermail/python-dev/2005- October/057510.html>

for discussion. The overflow check in PyOS_strtoul() is too clever, and causes these wrong results on a 32-bit box (on a box with sizeof(long) == 8, other examples would fail):

int('102002022201221111211', 3) = 0 int('32244002423141', 5) = 0 int('1550104015504', 6) = 0 int('211301422354', 7) = 0 int('12068657454', 9) = 0 int('1904440554', 11) = 0 int('9ba461594', 12) = 0 int('535a79889', 13) = 0 int('2ca5b7464', 14) = 0 int('1a20dcd81', 15) = 0 int('a7ffda91', 17) = 0 int('704he7g4', 18) = 0 int('4f5aff66', 19) = 0 int('3723ai4g', 20) = 0 int('281d55i4', 21) = 0 int('1fj8b184', 22) = 0 int('1606k7ic', 23) = 0 int('mb994ag', 24) = 0 int('hek2mgl', 25) = 0 int('dnchbnm', 26) = 0 int('b28jpdm', 27) = 0 int('8pfgih4', 28) = 0 int('76beigg', 29) = 0 int('5qmcpqg', 30) = 0 int('4q0jto4', 31) = 0 int('3aokq94', 33) = 0 int('2qhxjli', 34) = 0 int('2br45qb', 35) = 0 int('1z141z4', 36) = 0

Logged In: YES user_id=1064183

There is a special case (added by Guido back in 1997, r9327) for detecting overflows only for base 10, so that '4294967296' does not get interpreted as 0. It seems that the special case is no longer needed since the adoption on Python long integers in case of overflow. Using the case for base 10 for all bases (look at the attached 1334662-mystrtoul.c.diff) does the trick. Given this script::

print 2, int('100000000000000000000000000000000', 2) print 3, int('102002022201221111211', 3) print 4, int('10000000000000000', 4) print 5, int('32244002423141', 5) print 6, int('1550104015504', 6) print 7, int('211301422354', 7) print 8, int('40000000000', 8) print 9, int('12068657454', 9) print 10, int('4294967296', 10) print 11, int('1904440554', 11) print 12, int('9ba461594', 12) print 13, int('535a79889', 13) print 14, int('2ca5b7464', 14) print 15, int('1a20dcd81', 15) print 16, int('100000000', 16) print 17, int('a7ffda91', 17) print 18, int('704he7g4', 18) print 19, int('4f5aff66', 19) print 20, int('3723ai4g', 20) print 21, int('281d55i4', 21) print 22, int('1fj8b184', 22) print 23, int('1606k7ic', 23) print 24, int('mb994ag', 24) print 25, int('hek2mgl', 25) print 26, int('dnchbnm', 26) print 27, int('b28jpdm', 27) print 28, int('8pfgih4', 28) print 29, int('76beigg', 29) print 30, int('5qmcpqg', 30) print 31, int('4q0jto4', 31) print 32, int('4000000', 32) print 33, int('3aokq94', 33) print 34, int('2qhxjli', 34) print 35, int('2br45qb', 35) print 36, int('1z141z4', 36)

The old output is::

2 4294967296 3 0 4 4294967296 5 0 6 0 7 0 8 4294967296 9 0 10 4294967296 11 0 12 0 13 0 14 0 15 0 16 4294967296 17 0 18 0 19 0 20 0 21 0 22 0 23 0 24 0 25 0 26 0 27 0 28 0 29 0 30 0 31 0 32 4294967296 33 0 34 0 35 0 36 0

And the new one::

2 4294967296 3 4294967296 4 4294967296 5 4294967296 6 4294967296 ... 32 4294967296 33 4294967296 34 4294967296 35 4294967296 36 4294967296

The old bugs should be tested for in test_builtin.py, but I don't know the Python test infrastructure too well. I will give it a try, nonetheless.