[Python-Dev] Darwin's realloc(...) implementation never shrinks allocations (original) (raw)

Bob Ippolito bob at redivi.com
Mon Jan 3 07:08:24 CET 2005


On Jan 3, 2005, at 12:13 AM, Tim Peters wrote:

[Bob Ippolito]

Quite a few notable places in the Python sources expect realloc(...) to relinquish some memory if the requested size is smaller than the currently allocated size. I don't know what "relinquish some memory" means. If it means something like "returns memory to the OS, so that the reported process size shrinks", then no, nothing in Python ever assumes that. That's simply because "returns memory to the OS" and "process size" aren't concepts in the C standard, and so nothing can be said about them in general -- not in theory, and neither in practice, because platforms (OS+libc combos) vary so widely in behavior here. As a pragmatic matter, I expect that a production-quality realloc() implementation will at least be able to reuse released memory, provided that the amount released is at least half the amount originally malloc()'ed (and, e.g., reasonable buddy systems may not be able to do better than that).

This is what I meant by relinquish (c/o merriam-webster): a : to stop holding physically : RELEASE b : to give over possession or control of : YIELD

Your expectation is not correct for Darwin's memory allocation scheme.
It seems that Darwin creates allocations of immutable size. The only way ANY part of an allocation will ever be used by ANYTHING else is if free() is called with that allocation. free() can be called either explicitly, or implicitly by calling realloc() with a size larger than the size of the allocation. In that case, it will create a new allocation of at least the requested size, copy the contents of the original allocation into the new allocation (probably with copy-on-write pages if it's large enough, so it might be cheap), and free() the allocation. In the case where realloc() specifies a size that is not greater than the allocation's size, it will simply return the given allocation and cause no side-effects whatsoever.

Was this a good decision? Probably not! However, it is our (in the "I know you use Windows but I am not the only one that uses Mac OS X" sense) problem so long as Darwin is a supported platform, because it is highly unlikely that Apple will backport any "fix" to the allocator unless we can prove it has some security implications in software shipped with their OS. I attempted to look for some easy ones by performing a quick audit of Apache, OpenSSH, and OpenSSL.
Unfortunately, their developers did not share your expectation. I found one sprintf-like routine in Apache that could be affected by this behavior, and one instance of immutable string creation in Apple's CoreFoundation CFString implementation, but I have yet to find an easy way to exploit this behavior from the outside. I should probably be looking at PHP and Perl instead ;)

but I figure Darwin does this as an "optimization" and because Darwin probably can't resize mmap'ed memory (at least it can't from Python, but this probably means it doesn't have this capability at all).

It is possible to "fix" this for Darwin, I don't understand what's "broken". Small objects go thru Python's own allocator, which has its own realloc policies and its own peculiarities (chiefly that pymalloc never free()s any memory allocated for small objects).

What's broken is that there are several places in Python that seem to assume that you can allocate a large chunk of memory, and make it smaller in some meaningful way with realloc(...). This is not true with Darwin. You are right about small objects. They don't matter because they're small, and because they're handled by Python's allocator.

because you can ask the default malloc zone how big a particular allocation is, and how big an allocation of a given size will actually be (see: <malloc/malloc.h>). The obvious place to put this would be PyObjectRealloc, because this is at least called by PyStringResize (which will fix <http://python.org/sf/1092502>). The diagnosis in the bug report seems to leave it pointing at socket.py's fileobject.read(), although I suspect the real cause is in socketmodule.c's sockrecv(). We've had other reports of various problems when people pass absurdly large values to socket recv(). A better fix here would probably amount to rewriting sockrecv() to refuse to pass enormous numbers to the platform recv() (it appears that many platform recv() implementations simply don't expect a recv() argument to be much bigger than the native network buffer size, and screw up when that's not so).

You are correct. The real cause is in sock_recv(), and/or _PyString_Resize(), depending on how you look at it.

Note that all versions of Darwin that I've looked at (6.x, 7.x, and 8.0b1 corresponding to publicly available WWDC 2004 Tiger code) have this "issue", but it might go away by Mac OS X 10.4 or some later release. It would be good to rewrite sockrecv() more defensively in any case. Best I can tell, this implementation of realloc() is standard-conforming but uniquely brain dead in its downsize behavior.

Presumably this can happen at other places (including third party extensions), so a better place to do this might be _PyString_Resize().
list_resize() is another reasonable place to put this. I'm sure there are other places that use realloc() too, and the majority of them do this through obmalloc. So maybe instead of trying to track down all the places where this can manifest, we should just "gunk up" Python and patch PyObject_Realloc()? Since we are both pretty confident that other allocators aren't like Darwin, this "gunk" can be #ifdef'ed to the APPLE case.

I don't expect the latter will last (as you say on your page, "probably plenty of other software" also makes the same pragmatic assumptions about realloc downsize behavior), so I'm not keen to gunk up Python to worm around it.

As I said above, I haven't yet found any other software that makes the same kind of realloc() assumptions that Python does. I'm sure I'll find something, but what's important to me is that Python works well on Mac OS X, so something should happen. If we can't prove that Apple's allocation strategy is a security flaw in some service that ships with the OS, any improvements to this strategy are very unlikely to be backported to current versions of Mac OS X.

-bob



More information about the Python-Dev mailing list