[Python-Dev] Memory size overflows (original) (raw)

Armin Rigo arigo@ulb.ac.be
Mon, 14 Oct 2002 00:23:34 +0200

Previous message: [Python-Dev] Memory size overflows
Next message: [Python-Dev] Memory size overflows
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hello Tim,

Let me quote from stringobject.c the "heartbreaking code such as string_repeat's":

    nbytes = size * sizeof(char);
    if (nbytes / sizeof(char) != (size_t)size ||
        nbytes + sizeof(PyStringObject) <= nbytes)
            <OverflowError>
    op = (PyStringObject *)
            PyObject_MALLOC(sizeof(PyStringObject) + nbytes);
    if (op == NULL)
            <MemoryError>

With the proper macro, the whole code above can be written as:

    nbytes = Py_MUL_ADD(size, sizeof(char), sizeof(PyStringObject));
    op = (PyStringObject *) PyObject_MALLOC(nbytes);
    if (op == NULL)
            <MemoryError>

which seems much cleaner to me.

For this reason I do not believe it is a good idea here to define a macro a la Zope C:

DO_MULADD_OR(nbytes, size, sizeof(char), sizeof(PyStringObject),

goto Overflow);

for two reasons: there is no real need for a special "Overflow" case, as long as the returned 'nbytes' is guaranteed to be un-malloc-able; besides, no existing macro (e.g. _PyObject_VAR_SIZE) can support an extra "goto Overflow" case without breaking C code that depends on it.

There is however a difference between the two above code fragments, in that the former will raise an OverflowError for "really too large values" and a MemoryError for "reasonable value but still too large for me to allocate right now". Whether this is a correct behavior is debatable. I would argue that raising MemoryError in both cases is fine: after all, "123"*9999999999999999 could be perfectly fine in another Python implementation with lazy strings. It is thus only a memory limit that makes this fail in the current implementation.

Note that if we define MAX_SIZE to ((size_t)-1), the overflow check in the present stringobject.c could be more efficiently written down as:

    if (size > (MAX_SIZE - sizeof(PyStringObject)) / sizeof(char))
            <OverflowError>

where everything at the right of '>' reduces to a compile-time constant. The Py_MUL_ADD macro can be implemented with this idea in mind, which works best if the 2nd and 3rd arguments are constants.

note that anything/nonconstant can be incredibly expensive!

Yes. Couldn't a better overflow-checking algorithm be designed for these cases? For example, to multiply the non-constants 'a' and 'b', we can be sure that there is no overflow if both 'a' and 'b' are small enough (which is the common case anyway). I believe I can even think about a single Py_MUL_ADD macro that is as efficient as above with constant arguments, and otherwise uses this trick.

About using 64-bit computations everywhere (as suggested by Christian): if the length of objects (e.g. lists) is still constrained to 31 bits, then it's fine; otherwise, the problem is the same. Do all supported 64-bit platforms have a 32-bit 'int', and thus a 31-bit limit on object lengths?

By the way, this makes me think of a good efficient overflow-detection method for platforms with a "long long" type longer than size_t:

unsigned long long bytes = ((unsigned long long)a)*((unsigned long long)b)+c; if (bytes > ((size_t)-1)) ;

This compiles to very good assembly code on x86.

Mmmh, in summary I would recommend that we define a common "multiply-and-add" macro, returning a size_t or OVERFLOW_VALUE in case of overflow, which is implemented in several different ways on different platforms, as selected by #if's. The same macro is used for just "multiply" or just "add" by relying on the compiler's strengh reduction.

Armin

Previous message: [Python-Dev] Memory size overflows
Next message: [Python-Dev] Memory size overflows
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]