[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin) (original) (raw)

Tim Peters tim.one@comcast.net
Fri, 26 Apr 2002 14:21:29 -0400

Previous message: [Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)
Next message: [Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[MAL]

The standard reasoning behind using overallocation for memory management is that typical modern malloc()s don't really allocate the memory until it is used (you know this anyway...),

[Martin]

... Most strings are smaller than a page, so if you allocate lots of strings, every page allocated from the system will be used as well (at least to fill in the string header). With overallocation, you will indeed overallocate real pages, and consume real memory.

But Marc-Andre uses realloc at the end to return the excess. The excess bytes will get reused (and some returned yet again) by the next overallocation, and so on. The only memory touched by him is the only memory he actually needs in the end. Indeed, if strings are always smaller than a page, the aggregate overallocation at any point in a single-threaded program will be at worst a few pages total (overallocation is never more than a factor of 4, and the excess is always given back untouched).

As Martin's benchmark showed, the counting strategy is faster for small chunks and this is probably due to the fact that pymalloc manages these.

I doubt this claim.

If you both run the test with and without pymalloc enabled, the results should speak for themselves. I have not, but suspect the difference has more to do with that caches like small memory areas better than large ones, and especially when you crawl over a memory area twice.

MAL, you should keep in mind that pymalloc is also managing the small chunks in your scheme: when you're fiddling with a 40-character Unicode string, an overallocation "by a factor of 4" only amounts to an 80-character UTF8 string. pymalloc handles blocks that small under either scheme, and indeed your current scheme is getting a speed benefit from that pymalloc currently refuses to copy the data to a smaller block when there's a shrinking realloc at the end.

Since pymalloc cannot know that an algorithm wants to use overallocation as memory allocation strategy, it would probably help to find a way to tell pymalloc about this fact. It could then either redirect the request to the system malloc() or use a different malloc strategy for these chunks.

Possibly, yes. It's still in need of quantifying the speed versus memory tradeoffs. pymalloc's current small-block realloc strategy favors speed, and going to the system malloc instead would be slower. You haven't yet told me that's what you actually want .

Previous message: [Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)
Next message: [Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]