[Python-Dev] Debug entry points for PyMalloc (original) (raw)

Tim Peters tim.one@comcast.net
Sat, 23 Mar 2002 18:59:59 -0500


[Michael Hudson]

Yes. Particularly if you can call it from gdb.

[Tim]

Is something extraordinary required to make that possible? I had in mind nothing fancier than

extern void PyMallocDebugCheckAddress(void* p);

That grew a teensy bit fancier: the arg changed to const. A void _PyMalloc_DebugCheckAddress(const void *p) entry also sprouted, to display info about the memory block p, to stderr. It should really go somewhere else on Windows, but too little bang for the buck for me to bother complicating it more.

[Michael]

Dunno. I ought to learn how to use gdb properly.

Let me know if you hit a snag. They're simple enough that I can get away with calling them in an MSVC "watch window", and I sure hope gdb isn't feebler than that .

[Aahz]

I'm almost certainly betraying my ignorance here, but it sounds to me like malloc isn't doing any sanity checking to make sure that the memory it received isn't already being used.

Well, malloc doesn't receive memory, it allocates it, and _PyMalloc_DebugMalloc just wraps somebody else's malloc. It's not trying to debug the platform malloc, it's trying to debug "the user's" (Python's) use of the memory malloc returns.

Should each PyDebugMalloc() walk through the list of used memory?

There isn't any list for it to walk -- it's not an allocator, it's a wrapper around somebody's else's allocator, and has no knowledge of how the allocator(s) it calls work (beyond assuming that they meet the C defns of how malloc() & friends must behave).

One thing we could do, but I'll leave it to someone else: in a debug build, Python maintains a linked list (in the C sense, not the Python sense) of "almost all" live objects. Walking that list and calling _PyMalloc_DebugCheckAddress() on each object should detect "almost all" out-of-bounds stores that may have happened since the last time that was done. That first requires a way to make all calls to all allocators funnel thru the debug malloc wrappers (the code right now only wraps calls to the pymalloc allocator).

[Skip Montanaro]

Any possibility that the LINE or FILE:LINE at which a chunk of memory was freed could be imprinted as ASCII in freed memory without changing the API?

Which API? Regardless of the answer , I'm not yet sure it's even possible to get a little integer identifying the "API family" through the macro layers correctly. That's much more important to me, since it addresses a practical widespread problem. If you want to redesign the macros to make that possible, I expect it would also make passing anything else thru the macros possible too .

I'd find something like

<0340><0340><0340><0340><0340> or object.c:0340object.c:0340 more useful than a string of 0xDB0xDB0xDB0xDB0xDB0xDB0xDB0xDB bytes.

I'm not sure that I would. The advantage of 0xdbdbdbdbdb... is two-fold:

  1. In a debugger, the 0xdbdbdb... stuff stands out like an inflamed boil. The second or third time you see it happen in real life, concluding "ah, this PyObject* was already freed!" becomes automatic.

  2. 0xdbdbdbdb is very likely an invalid memory address, so that, e.g., attempting to do op->ob_type on an already-freed PyObject* op is very likely to trigger a memory error.

We may be able to get the best of both worlds by storing ASCII in the tail end of freed memory (for whatever reason, vital pointers tend to show up near the start of a struct).

As-is, the "serial number" of the block is left behind, so you can determine which call to malloc created (or call to realloc last changed) the block. Then in a second run, you can set a counting breakpoint to trigger on that call to the debug malloc/realloc, and after that triggers set a conditional breakpoint in the debug free (to break when the memory address is free'd). Then you'll catch the free at the time it occurs. There are ways this can fail to work .

I did something similar in a small single-file library I've been working on, though I didn't pay much attention to preserving the malloc/free API because, like I said, it was something small. I simply changed all free() calls to something like

MARKTERRITORY(s, strlen(s), LINE); free(s); (The second arg was appropriate to the size of the memory chunk being freed.)

We can make the new-in-2.3 _PyMalloc_XXX calls do anything you can dream up, but my time on this has been of the "do a right thing and apologize later" flavor. Whoever wants substantially more is going to have to do most of the work to get it.