[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin) (original) (raw)

Tim Peters tim.one@comcast.net
Fri, 26 Apr 2002 16:59:23 -0400


[Tim]

But Marc-Andre uses realloc at the end to return the excess. The excess bytes will get reused (and some returned yet again) by the next overallocation, and so on.

[Martin]

Right. I confused this with the fact that PyMemRealloc won't return the excess memory,

PyMem_Realloc does whatever the system realloc does -- PyMem_Realloc doesn't go thru pymalloc today (except in a PYMALLOC_DEBUG build). Doesn't matter, though, since strings use the PyObject_{Malloc, Free, Realloc} family today, and that does use pymalloc. OTOH, there's no reason PyObject_Realloc has to hang on to all small-block memory on a shrinking realloc, and there's no reason pymalloc couldn't grow another realloc entry point specifying what the caller wants a shrinking realloc to do. These things are all easy to change, but I don't know what's truly desirable.

Note another subtlety: I expect you brought up PyMem_Realloc because unicodeobject.c uses the PyMem_XYZ family for managing the PyUnicodeObject.str member today. That means it normally never uses pymalloc at all, except to allocate fixed-size PyUnicodeObject structs (which use the PyObject_XYZ memory family). I don't know whether that's the best idea, but that's how it is today.

pymalloc gets into this because PyUnicode_EncodeUTF8 returns a plain string object, and the latter uses pymalloc today.

so the extra bytes in a small string will be wasted for the life time of the string object - that still could cause significant memory wastage.

It could. Python generally aims to optimize the expected case, not jump thru hoops to avoid worst cases (else we wouldn't use dicts at all ). But I don't know what the expected case is here, and given how often I use Unicode in my own work it could be I'll never have a clue. Note that the expected uses of Unicode strings makes no difference to PyUnicode_EncodeUTF8: what counts there is the expected lifetimes and sizes of the "plain" utf8-encoded PyStringObjects it computes. Indeed, pymalloc has almost no implications for Unicode beyond the encode-as-a-plain-string functions (unless unicodeobject.c is changed to manage the PyUnicodeObject.str member using pymalloc too, as plain strings do today).

MAL, you should keep in mind that pymalloc is also managing the small chunks in your scheme: when you're fiddling with a 40-character Unicode string, an overallocation "by a factor of 4" only amounts to an 80-character UTF8 string.

[I guess this is a terminology, not a math problem:

Nope! Turns out it was an hallucination problem .

a 40 character Unicode string has already 80 bytes; the UTF-8 of it can have up to 160 bytes].

You're right, of course. The conclusion doesn't change, though: that's still in the range of block pymalloc handles (and will remain so unless I reduce pymalloc's small-object threshold below what's needed for pymalloc to handle small dicts on its own -- which I'm unlikely to do).