[Python-Dev] pymalloc and overallocation (unicodeobject.c,2.139,2.140 checkin) (original) (raw)

Tim Peters tim.one@comcast.net
Sat, 27 Apr 2002 01:01:03 -0400


[martin@v.loewis.de]

That's what I mean (I'm really confused about memory family APIs, ever since everything changed :-)

Here's the in-depth course:

PyMem_xyz calls the platform malloc/realloc/free (fiddled for
    x-platform uniformity in NULL and 0 handling)

PyObject_xyz calls pymalloc's malloc/realloc/free

and instead of a dozen layers of indirection we've now got crushingly straightforward WYSIWYG preprocessor blocks like:

#ifdef WITH_PYMALLOC #ifdef PYMALLOC_DEBUG #define PyObject_MALLOC _PyObject_DebugMalloc #define PyObject_Malloc _PyObject_DebugMalloc #define PyObject_REALLOC _PyObject_DebugRealloc #define PyObject_Realloc _PyObject_DebugRealloc #define PyObject_FREE _PyObject_DebugFree #define PyObject_Free _PyObject_DebugFree

#else /* WITH_PYMALLOC && ! PYMALLOC_DEBUG */ #define PyObject_MALLOC PyObject_Malloc #define PyObject_REALLOC PyObject_Realloc #define PyObject_FREE PyObject_Free #endif

#else /* ! WITH_PYMALLOC / #define PyObject_MALLOC PyMem_MALLOC #define PyObject_REALLOC PyMem_REALLOC #define PyObject_FREE PyMem_FREE #endif / WITH_PYMALLOC */

#define PyObject_Del PyObject_Free #define PyObject_DEL PyObject_FREE

/* for source compatibility with 2.2 */ #define _PyObject_Del PyObject_Free

All the names you love are still there, it's just that most of them are redundant now .

... I do think that the Unicode data should be managed by pymalloc as well.

Well, that largely depends on how big these suckers are. Calling PyObject_XYZ adds real overhead if pymalloc can't handle the requested size: all the overhead of the system routines, + the overhead of pymalloc figuring out it can't handle it. I expect it's also not good to mix pymalloc with custom free lists: you hold on to one object from a pymalloc pool, and it prevents the entire pool from getting recycled for another size class. So if you want to investigate using pymalloc more heavily for Unicode objects, I suggest two things:

  1. Get rid of the Unicode-specific free list.

  2. Change the object layout to embed the str member storage, just as PyStringObject does.

#1 is pretty localized, but #2 would require changing a lot of code.

Of course, DecodeUTF8 would then raise the same issue: decoding UTF-8 doesn't know how many characters you'll get, either. This currently does not try to be clever, but allocates enough memory for the worst case.

I just put a patch up on SourceForge that's less clever, but shouldn't waste any memory in the end. I expect you'll be happy with it, or rant inconsolably. It's all the same to me .