[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin) (original) (raw)

Tim Peters tim.one@comcast.net
Fri, 26 Apr 2002 17:47:54 -0400


[Guido]

Would it make sense to change the Unicode object to use pymalloc, and to change the UTF-8 codec to count the bytes if the shortest possible output would fit in a pymalloc block?

These are independent questions, and I don't know how to answer either unless you give me a test program that prints the value of the function you're trying to minimize <0.7 wink>.

The Unicode object currenly uses quite an elaborate free list, caching both PyUnicodeObject structs (which currently use pymalloc), and their str members (which currently do not). Whether the str member uses pymalloc really doesn't have anything to do with what the UTF8 encoder function does (it returns plain strings, and those already use pymalloc today -- and it's not entirely clear whether they should either!).

Counting the bytes in the UTF8 decoder could work well, independent of that: if the result is known to fit in a pymalloc block, just do it; as soon as it's known that it won't, overallocate with assurance that the system realloc will give back everything that isn't used. In the latter case I believe the code could be made much simpler, by doing a factor-of-4 overallocation from the start (it currently tries 2, then 3, then 4, with a bunch of embedded-in-the-loops tests to prevent overwrites; I'm not sure why it bothers with this staggered scheme, since it's going to touch exactly as much memory as it actually needs regardless, and give all the rest back untouched).

(I guess this means that the length of the Unicode string should be less than SMALLREQUESTTHRESHOLD - currently 256.)

For a start, yes. I'd stick a "Py_" in front of that symbol and expose it then. The cutoff test would also have to take into account the size of the result's PyStringObject header (the whole stringobject enchilada counts against the threshold).