[Python-Dev] Debug entry points for PyMalloc (original) (raw)
Tim Peters tim.one@comcast.net
Thu, 21 Mar 2002 01🔞24 -0500
- Previous message: [Python-Dev] Patch 532638: Better AttributeError formatting
- Next message: [Python-Dev] Debug entry points for PyMalloc
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The thing I've dreaded most about switching to pymalloc is losing the invaluable memory-corruption clues supplied by the Microsoft debug-build malloc. On more than one occasion, they've found wild stores, out-of-bounds reads, reads of uninitialized memory, and reads of free()ed memory in Python. It does this by spraying special bytes all over malloc'ed memory at various times, then checking the bytes for sanity at free() and realloc() times.
This kind of stuff is going to be pure hell under pymalloc, because there's no padding at all between chunks pymalloc passes out, and pymalloc stores valid addresses at the start of free()ed memory. So a wild store probably can't be detected as any sort of memory corruption, it will simply overwrite part of some other end-user object -- or corrupt pymalloc's internal pointers linking free()ed memory (and pymalloc simply goes nuts then). Several months ago Martin and I took turns thinking about a memory overwrite problem in the Unicode stuff that showed up under pymalloc, and it was indeed pure hell to track it down.
Following is a sketch for teaching pymalloc how to do something similar to the MS scheme. A twist over the MS scheme is adding a "serial number" to the pad bytes, incremented by one for each malloc/realloc. At a crude level, this gives a sense of age to the eyeball; for reproducible memory nightmares, it gives an exact way to set a data, or counting, breakpoint (on the next run) to capture the instant at which a doomed-to-go-bad memory block first gets passed out. I hope that addresses the worst problem the MS scheme still leaves untouched: you can catch memory corruption pretty will with it, but all you know then is that "the byte at this address is bad", and you have no idea what the memory's original purpose in life was.
Sketch of Debug Mode for PyMalloc
- Three new entry points in obmalloc.c (note: stop #include'ing this; hiding code in include files sucks, and naming an include file .c compounds the confusion):
DL_IMPORT(void *) _PyMalloc_DebugMalloc(size_t nbytes); DL_IMPORT(void *) _PyMalloc_DebugRealloc(void *p, size_t nbytes); DL_IMPORT(void) _PyMalloc_DebugFree(void *p);
- When WITH_PYMALLOC and PYMALLOC_DEBUG are #define'd, these are mapped to in the obvious way from PyMalloc{MALLOC, REALLOC, FREE}:
#ifdef WITH_PYMALLOC DL_IMPORT(void *) _PyMalloc_Malloc(size_t nbytes); DL_IMPORT(void *) _PyMalloc_Realloc(void *p, size_t nbytes); DL_IMPORT(void) _PyMalloc_Free(void *p);
DL_IMPORT(void *) _PyMalloc_DebugMalloc(size_t nbytes); DL_IMPORT(void *) _PyMalloc_DebugRealloc(void *p, size_t nbytes); DL_IMPORT(void) _PyMalloc_DebugFree(void *p);
#ifdef PYMALLOC_DEBUG #define _PyMalloc_MALLOC _PyMalloc_DebugMalloc #define _PyMalloc_REALLOC _PyMalloc_DebugRealloc #define _PyMalloc_FREE _PyMalloc_DebugFree
#else /* WITH_PYMALLOC && !PYMALLOC_DEBUG */ #define _PyMalloc_MALLOC _PyMalloc_Malloc #define _PyMalloc_REALLOC _PyMalloc_Realloc #define _PyMalloc_FREE _PyMalloc_Free
#endif /* PYMALLOC_DEBUG */
#else /* !WITH_PYMALLOC / #define _PyMalloc_MALLOC PyMem_MALLOC #define _PyMalloc_REALLOC PyMem_REALLOC #define _PyMalloc_FREE PyMem_FREE #endif / WITH_PYMALLOC */
A debug build implies PYMALLOC_DEBUG, but PYMALLOC_DEBUG can be forced in a release build.
No changes to the guts of PyMalloc{Malloc, Realloc, Free}. Keep them as lean and as clear of #ifdef obscurity as they are now.
Define three special bit patterns. In hex, they all end with B (for deBug ), and begin with a vaguely mnemonic letter. Strings of these are unlikely to be legit memory addresses, ints, 7-bit ASCII, or floats:
#define PYMALLOC_CLEANBYTE 0xCB /* uninitialized memory / #define PYMALLOC_DEADBYTE 0xDB / free()ed memory / #define PYMALLOC_FORBIDDENBYTE 0xFB / unusable memory */
The debug malloc/free/realloc use these as follows. Note that this stuff is done regardless of whether PyMalloc handles the request directly or passes it on to the platform malloc (in fact, the debug entry points won't know and won't care).
The Debug malloc asks for 16 extra bytes and fills them with useful stuff:
p[0:4] Number of bytes originally asked for. 4-byte unsigned integer, big-endian (easier to read in a memory dump). p[4:8] Copies of PYMALLOC_FORBIDDENBYTE. Used to catch under- writes and reads. p[8:8+n] The requested memory, filled with copies of PYMALLOC_CLEANBYTE. Used to catch reference to uninitialized memory. &p[8] is returned. Note that this is 8-byte aligned if PyMalloc handled the request itself. p[8+n:8+n+4] Copies of PYMALLOC_FORBIDDENBYTE. Used to catch over- writes and reads. p[8+n+4:8+n+8] A serial number, from a PyMalloc file static, incremented by 1 on each call to _PyMalloc_DebugMalloc and _PyMalloc_DebugRealloc. 4-byte unsigned integer, big-endian. If "bad memory" is detected later, the serial number gives an excellent way to set a breakpoint on the next run, to capture the instant at which this block was passed out.
The Debug free first uses the address to find the number of bytes originally asked for, then checks the 8 bytes on each end for sanity (in particular, that the PYMALLOC_FORBIDDENBYTEs are still intact). XXX Make this checking a distinct entry point. XXX In case an error is found, print informative stuff, but then what? XXX Die or keep going? Fatal error is probably best. Then fills the original N bytes with PYMALLOC_DEADBYTE. This is to catch references to free()ed memory. The forbidden bytes are left intact. Then calls _PyMalloc_Free.
The Debug realloc first calls _PyMalloc_DebugMalloc with the new request size. Then copies over the original bytes. The calls _PyMalloc_DebugFree on the original bytes. XXX This could, and probably should, be optimized to avoid copying XXX every time.
- Previous message: [Python-Dev] Patch 532638: Better AttributeError formatting
- Next message: [Python-Dev] Debug entry points for PyMalloc
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]