[Python-Dev] Memory size overflows (original) (raw)

Armin Rigo arigo@ulb.ac.be
Sat, 12 Oct 2002 19:45:02 +0200


Hello everybody,

All around the C code there are potential problems with objects of very large sizes (http://www.python.org/sf/618623). The problem is that to allocate a variable-sized object of type 't' with 'n' elements we compute 'n*t->tp_itemsize', which can overflow even if 'n' is a perfectly legal value. If the truncated result is small, the subsequent malloc() suceeds, and we run into a segfault by trying to access more memory than reserved. The same problem exists at other places -- more or less everywhere we add or multiply something to a number that could be user-supplied. For example, Guido just fixed '%2147483647d'%-123. A rather artificial example, I agree, but a hole anyway.

To fix this I suggest introducing a few new macros in pymem.h that compute things about sizes with overflow checking. I can see a couple of approaches based on special values that mean "overflow":

  1. there is just one special "overflow" value, e.g. ((size_t)-1), that is returned and propagated by the macros when an overflow is detected. This might be error-prone because if we forget once to use the macros to add a few bytes to the size, this special value will wrap down to a small legal value -- and segfault.

  2. same as above, but with a whole range of overflow values. For example, just assume (or decide) that no malloc of more than half the maximum number that fits into a size_t can succeed. We don't need any macro to add a (resonable) constant to a size. We need a macro for multiplication that -- upon overflow -- returns the first number of the "overflow" range. The Add macro is still needed to sum two potentially large numbers.

  3. we compute all sizes with signed integers (int or long), as is currently (erroneously?) done at many places. Any negative integer is regarded as "overflow", but the multiplication macro returns the largest negative integer in case of overflow, so that as above no addition macro is needed for the simple cases.

This will require a "multiplication hunt party" :-)

Also, approaches 2 and 3 require fixes to ensure that 'malloc(any-overflow-size)' always fail, for any of the several implementations of malloc found in the code. Even with approach 1, I would not trust the platform malloc to correctly fail on malloc(-1) -- I guess it might "round up" the value to 0 before it proceed...

Armin