[Python-Dev] pymalloc killer (original) (raw)

Tim Peters tim.one@comcast.net
Fri, 29 Mar 2002 20:03:02 -0500


[Tim]

Note that I am not proposing to change PyMem{MALLOC, Malloc)

[martin@v.loewis.de]

I understand that (I think). Neither do I.

I think you have to under this scheme, although I'm not clear on what "the problem" is you're trying to solve. I'm trying to cater to allocating via PyMem_{MALLOC, Malloc} then free'ing via PyMalloc_Free. That latter is because mixing PyObject_New with PyMem{DEL, FREE, Del, Free} is frequent in real-life code now. If Guido is to get his wish that we keep the PyObject_XXX interface and not introduce a new PyMalloc_XXX object interface, and I'm to get my wish that we don't break mountains of existing works-fine code, that means PyObject_XXX and PyMem_{DEL, FREE, Del, Free} have to funnel thru the pymalloc package. Else existing "get object" and "free object" mixtures will break. That in turn means memory allocated by PyMem_{Malloc, MALLOC, Realloc, REALLOC, Resize, RESIZE, New, NEW} also has to be suitable for feeding into pymalloc's free. But I'm not proposing to change PyMem_{Malloc, MALLOC, Realloc, REALLOC, Resize, RESIZE, New, NEW} at all, meaning they'll continue to call platform malloc() and realloc() directly without any tricky wrappers.

That's the killer. We have no control over "the system allocator", by which I mean libc's malloc() and realloc().

We sure do. In PyMallocMalloc, replace the line

This can't affect anything obtained via PyMem_{MALLOC, Malloc}, because they call the platform malloc() directly. We don't "wrap" them in any significant way.

return (void *)PyMemMALLOC(nbytes);

with void **result; result = (void**)PyMemMALLOC(nbytes+8); result[1] = NULL; return (result+2);

Yes, PyMalloc_Malloc can do anything at all, but PyMem{MALLOC, Malloc} don't go thru this code.

Likewise, replace

offset = (offt )p & POOLSIZEMASK; pool = (poolp )((block *)p - offset); if (pool->pooladdr != pool || pool->magic != (uint )POOLMAGIC) { PyMemFREE(p); return; } with void block = (void)p; if(block[-1] == NULL){ PyMemFREE(block-2); return; }

Something allocated via PyMem_{MALLOC, Malloc} that goes through this code will read random gibberish (or carefully constructed hostile gibberish) at block[-1]. So it's not safe for what I'm trying to accomplish.

As above, trying to hijack them is unattractive due to thread issues, even for just the PyMem{Malloc, MALLOC, Realloc, REALLOC, New, NEW, Resize, RESIZE} spellings.

I can't see how this approach would add thread issues.

That's because it seems you're not trying to make PyMem_{MALLOC, Malloc} safe for mixed use at all. If you did, I suppose they could funnel through a wrapper other than _PyMalloc_Malloc (at the cost of other complications).

If we take over the system malloc, we need also to mimic the platform malloc's alignment.

I'm not sure what you mean by "take over".

I mean stop PyMem_XXX from calling platform malloc/realloc/free directly, so that they can be made safe for mixing with _PyMalloc_Free.

It will still be used as an allocator for arenas, and for large requests.

... Indeed, this approach might be slightly more expensive. In return, it is safe and (time-)efficient.

It's safe under some set of assumptions I'm not sure about, but not safe enough under the assumptions I am sure about .

Primarily that hijacking the system malloc is fraught with new difficulties.

That might be, but I'm not proposing this (or I don't understand the work "hijacking").

It means "take over" . If a scheme can't handle free'ing memory safely when that memory was returned directly from the platform malloc/realloc, it can't handle the sum of the things Guido and I want here (he's keener to preserve PyObject_XXX; I'm keener not to break existing code, even code that doesn't play by what some subset of Python-Dev'vers thinks might be "the rules").

... That safe-guard isn't needed, IMO. Returning memory allocated directly through malloc (and not via PyMallocMalloc) to PyMallocFree is clearly a bug,

Then you either have to change PyMem_{Malloc, MALLOC, etc etc etc} too, or you have to tell extension authors that every extension type they wrote for 1.5.2 will now fail in horrible ways (data-dependent and intermittent segfaults, data corruption, etc), and tell users that every "old" extension they pick up on the web may suffer such problems too even if it compiles cleanly for 2.3.

and there are more efficient ways to detect this kind of bug (like your pymalloc debug mechanism). In addition, there are so many other kinds of bugs that this doesn't protect against (like not taking the GC header into account when releasing memory) that this alone isn't convincing.

The difference is that I don't consider mixing PyObject_XXX with PyMem_XXX to be "a bug". In 1.5.2 such mixing was necessary, and still works fine in 2.2. I worry about extension authors who do bother to fix code that works fine .