[Python-Dev] pymalloc killer (original) (raw)

Martin v. Loewis martin@v.loewis.de
29 Mar 2002 23:15:40 +0100


Tim Peters <tim.one@comcast.net> writes:

> Instead, if anything needs to be done, I'd be in favour of using a > constant-time algorithm, even if it means that a littler more memory > overhead is necessary. > > I have the following idea: each chunk allocated by ought ^ Allocated by what? Some crucial words are missing here.

Sorry: each chunk allocated by pymalloc ought to have a pointer to its pool.

Note that I am not proposing to change PyMem{MALLOC, Malloc)

I understand that (I think). Neither do I.

> to have a pointer to its pool, immediately preceding the memory block.

In what fundamental way does this differ from pymalloc's current scheme of storing the pool address at an address calculated from the pointer address? I suspect the real difference resides in the next point: > This will make an overhead of 4 bytes per allocation. Chunks allocated > by the system allocator will have a null pointer preceding them. That's the killer. We have no control over "the system allocator", by which I mean libc's malloc() and realloc().

We sure do. In _PyMalloc_Malloc, replace the line

return (void *)PyMem_MALLOC(nbytes);

with

    void **result;

    result = (void**)PyMem_MALLOC(nbytes+8);
    result[1] = NULL;
return (result+2);

Likewise, replace

offset = (off_t )p & POOL_SIZE_MASK;
pool = (poolp )((block *)p - offset);
if (pool->pooladdr != pool || pool->magic != (uint )POOL_MAGIC) {
    PyMem_FREE(p);
    return;
}

with

    void **block = (void**)p;
    if(block[-1] == NULL){
    PyMem_FREE(block-2);
    return;
}

As above, trying to hijack them is unattractive due to thread issues, even for just the PyMem{Malloc, MALLOC, Realloc, REALLOC, New, NEW, Resize, RESIZE} spellings.

I can't see how this approach would add thread issues.

If we take over the system malloc, we need also to mimic the platform malloc's alignment.

I'm not sure what you mean by "take over". It will still be used as an allocator for arenas, and for large requests.

Pymalloc's object allocator can get away with 8-byte alignment because the core never requires worse-than-double alignment, and I'm willing to say that this is a language implementation restriction. I don't believe we can also restrict PyMemumpteenwaystospellgetmemory that way

I'm not proposing anything like this.

> This approach would allow to remove the requirement that each pool > must be 4k-aligned.

OK, that's why it really stores a whole pointer. Agreed then, but the memory lost to achieve page alignment once per-arena is likely small compared to adding 4 bytes to every allocation unit. The former loses at most 4KB per 256KB.

Indeed, this approach might be slightly more expensive. In return, it is safe and (time-)efficient.

It is actually not that much more expensive. For 96-bytes objects, you get 42 objects in a pool. With my change, 96-byte objects fall into the 100 byte size class, requiring 104 bytes per object, allowing for 39 objects in a pool, thus wasting 3*96=288 bytes, or 18KB per 256KB. For 100 byte objects, nothing is wasted, since there is already 4 byte padding in pymalloc today.

Primarily that hijacking the system malloc is fraught with new difficulties.

That might be, but I'm not proposing this (or I don't understand the work "hijacking").

The scheme I sketched is as hairy as it is exactly because it does the right thing even when freeing an address obtained directly from the platform malloc/realloc.

That safe-guard isn't needed, IMO. Returning memory allocated directly through malloc (and not via _PyMalloc_Malloc) to _PyMalloc_Free is clearly a bug, and there are more efficient ways to detect this kind of bug (like your pymalloc debug mechanism). In addition, there are so many other kinds of bugs that this doesn't protect against (like not taking the GC header into account when releasing memory) that this alone isn't convincing.

Regards, Martin