[Python-Dev] Memory size overflows (original) (raw)
Armin Rigo arigo@ulb.ac.be
Mon, 14 Oct 2002 00:23:34 +0200
- Previous message: [Python-Dev] Memory size overflows
- Next message: [Python-Dev] Memory size overflows
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello Tim,
Let me quote from stringobject.c the "heartbreaking code such as string_repeat's":
nbytes = size * sizeof(char);
if (nbytes / sizeof(char) != (size_t)size ||
nbytes + sizeof(PyStringObject) <= nbytes)
<OverflowError>
op = (PyStringObject *)
PyObject_MALLOC(sizeof(PyStringObject) + nbytes);
if (op == NULL)
<MemoryError>With the proper macro, the whole code above can be written as:
nbytes = Py_MUL_ADD(size, sizeof(char), sizeof(PyStringObject));
op = (PyStringObject *) PyObject_MALLOC(nbytes);
if (op == NULL)
<MemoryError>which seems much cleaner to me.
For this reason I do not believe it is a good idea here to define a macro a la Zope C:
DO_MULADD_OR(nbytes, size, sizeof(char), sizeof(PyStringObject),goto Overflow);
for two reasons: there is no real need for a special "Overflow" case, as long as the returned 'nbytes' is guaranteed to be un-malloc-able; besides, no existing macro (e.g. _PyObject_VAR_SIZE) can support an extra "goto Overflow" case without breaking C code that depends on it.
There is however a difference between the two above code fragments, in that the former will raise an OverflowError for "really too large values" and a MemoryError for "reasonable value but still too large for me to allocate right now". Whether this is a correct behavior is debatable. I would argue that raising MemoryError in both cases is fine: after all, "123"*9999999999999999 could be perfectly fine in another Python implementation with lazy strings. It is thus only a memory limit that makes this fail in the current implementation.
Note that if we define MAX_SIZE to ((size_t)-1), the overflow check in the present stringobject.c could be more efficiently written down as:
if (size > (MAX_SIZE - sizeof(PyStringObject)) / sizeof(char))
<OverflowError>where everything at the right of '>' reduces to a compile-time constant. The Py_MUL_ADD macro can be implemented with this idea in mind, which works best if the 2nd and 3rd arguments are constants.
note that anything/nonconstant can be incredibly expensive!
Yes. Couldn't a better overflow-checking algorithm be designed for these cases? For example, to multiply the non-constants 'a' and 'b', we can be sure that there is no overflow if both 'a' and 'b' are small enough (which is the common case anyway). I believe I can even think about a single Py_MUL_ADD macro that is as efficient as above with constant arguments, and otherwise uses this trick.
About using 64-bit computations everywhere (as suggested by Christian): if the length of objects (e.g. lists) is still constrained to 31 bits, then it's fine; otherwise, the problem is the same. Do all supported 64-bit platforms have a 32-bit 'int', and thus a 31-bit limit on object lengths?
By the way, this makes me think of a good efficient overflow-detection method for platforms with a "long long" type longer than size_t:
unsigned long long bytes = ((unsigned long long)a)*((unsigned long long)b)+c; if (bytes > ((size_t)-1)) ;
This compiles to very good assembly code on x86.
Mmmh, in summary I would recommend that we define a common "multiply-and-add" macro, returning a size_t or OVERFLOW_VALUE in case of overflow, which is implemented in several different ways on different platforms, as selected by #if's. The same macro is used for just "multiply" or just "add" by relying on the compiler's strengh reduction.
Armin
- Previous message: [Python-Dev] Memory size overflows
- Next message: [Python-Dev] Memory size overflows
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]