[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin) (original) (raw)

M.-A. Lemburg mal@lemburg.com
Fri, 26 Apr 2002 10:26:22 +0200


Tim Peters wrote:

[M.-A. Lemburg] > I don't know why it is, but Unicode always seems to unnecessarily > heat up any discussion involving it. Huh -- I thought I was the only one who noticed this .

Naa, it's occurred to me several times in the past. Unicode seems to trigger some memory corruption in Brain 2.2 which results in spilling out huge amounts of adrenalin and causes the blood pressure to reach record highs ;-)

> I would really like to know what is causing this: is it a religious > issue, does it have to do with the people involved or is Unicode > inherently controversial ?

Unicode had nothing to do with my yelling in this thread. I've got very low tolerance for memory corruption, regardless of source. When it happens once I'm on high alert, when it happens twice in the same place I go postal. Had this been in dictobject.c or boolobject.c, I would have been just as unhappy. Now that the memory corruption is thought to be solved, and verified in the debug build regardless, now I'll get cranky about foreigners and their lameass character sets .

Good to know.

On the technical issues remaining, I don't know how to judge the tradeoff between memory use and speed here. If you do, and pymalloc can help in some way, I'll be happy to help.

First of all, UTF-8 is probably the most common Unicode encoding used today and will certainly become the standard encoding within the next few years. So speed matters a lot in this particular corner of the Unicode implementation.

The standard reasoning behind using overallocation for memory management is that typical modern malloc()s don't really allocate the memory until it is used (you know this anyway...), so overallocation doesn't actually cause bundles of memory chips to heat up. This makes overallocation ideal for the case where you don't know the exact size in advance but where you can estimate a reasonable upper bound.

Now with pymalloc the situation is a bit different for smaller sized memory areas (larger chunks are handed off to the system malloc() which uses the above strategy).

As Martin's benchmark showed, the counting strategy is faster for small chunks and this is probably due to the fact that pymalloc manages these.

Since pymalloc cannot know that an algorithm wants to use overallocation as memory allocation strategy, it would probably help to find a way to tell pymalloc about this fact. It could then either redirect the request to the system malloc() or use a different malloc strategy for these chunks.

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH


Company & Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/