[Python-Dev] Updated PEP 454 (tracemalloc): no more metrics! (original) (raw)

Victor Stinner victor.stinner at gmail.com
Thu Oct 31 02:42:53 CET 2013


2013/10/31 Victor Stinner <victor.stinner at gmail.com>:

Log calls to the memory allocator ---------------------------------

A different approach is to log calls to malloc(), realloc() and free() functions. Calls can be logged into a file or send to another computer through the network. Example of a log entry: name of the function, size of the memory block, address of the memory block, Python traceback where the allocation occurred, timestamp. Logs cannot be used directly, getting the current status of the memory requires to parse previous logs. For example, it is not possible to get directly the traceback of a Python object, like getobjecttraceback(obj) does with traces. Python uses objects with a very short lifetime and so makes an extensive use of memory allocators. It has an allocator optimized for small objects (less than 512 bytes) with a short lifetime. For example, the Python test suites calls malloc(), realloc() or free() 270,000 times per second in average. If the size of log entry is 32 bytes, logging produces 8.2 MB per second or 29.0 GB per hour. The alternative was rejected because it is less efficient and has less features. Parsing logs in a different process or a different computer is slower than maintaining traces on allocated memory blocks in the same process.

"less features": get_object_traceback(obj), get_traces() and Snapshot.statistics() can be computed from the log, but you have to process a lot of data.

How much time does it take to compute statistics on 1 hour of logs? And for 1 week of logs? With tracemalloc you get these information in a few seconds (immediatly for get_object_traceback().

It should be possible to compute statistics every N minutes and store the result to not have to parse the whole log file at once.

Victor



More information about the Python-Dev mailing list