[Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140 (original) (raw)
Tim Peters tim.one@comcast.net
Sun, 21 Apr 2002 15:55:48 -0400
- Previous message: [Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140
- Next message: [Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
... I found the overallocations strategy that this code had just so embarrassing: a single three-byte character will cause an overallocation of three times the memory, which means that the final realloc will certainly truncate lots of bytes.
Well, Python routinely over-allocates string space all over the place, and it doesn't really cost more to throw away a million bytes at the end than to throw away five bytes. That is, the cost of a realloc shrink is at worst proportional to the number of retained bytes; the number of bytes thrown away doesn't matter. For that reason, the thing that puzzles me more about the current state of the code is why it doesn't overallocate by a factor of 4 from the start, and skip all the delicate, repeated "oops! I still didn't get enough memory" logic. It's almost certainly going to do a shrinking realloc at the end anyway.
As a result, we are at the mercy of the realloc implementation here: If realloc copies memory (such as Pymalloc might some day) when shrinking buffers, performance will get worse.
Since pymalloc only handles small blocks by itself, such copies would be of short contiguous memory regions that are almost certainly in L1 cache (since the encoder function just finished crawling over them). For that reason I doubt the speed hit would be monstrous.
Since this appears to be religious, I'm backing the patch out.
It needn't be religious : with sufficient effort we can quantify the tradeoff between the time pymalloc's realloc would take to move shrinking blocks, and the space wasted by realloc leaving them in place. I mentioned before that I want to give pymalloc a new entry point for realloc callers who specifically do or don't want a shrinking realloc to copy memory. For string storage I suspect it's better to move the memory (typically exactly one shrinking realloc, after which the string storage is immutable until death), while for most other kinds of storage it's better to leave the block alone (mutable memory that may shrink and grow repeatedly over time).
- Previous message: [Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140
- Next message: [Python-Dev] RE: [Python-checkins] python/dist/src/Objects unicodeobject.c,2.139,2.140
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]