[Python-Dev] Updated PEP 454 (tracemalloc): no more metrics! (original) (raw)

Victor Stinner victor.stinner at gmail.com
Thu Oct 24 03:03:21 CEST 2013

Previous message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Next message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

2013/10/23 Kristján Valur Jónsson <kristjan at ccpgames.com>:

This might be a good place to make some comments. I have discussed some of this in private with Victor, but wanted to make them here, for the record.

Yes, I prefer to discuss the PEP on python-dev. It's nice to get more feedback, I expect to get a better API at the end!

Oh, you have a lot of remarks, I will try to reply to all of them.

1) really, all that is required in terms of data is the traceback.gettraces() function. Further, it need not return addresses since they are not required for analysis. It is sufficient for it to return a list of (traceback, size, count) tuples. I understand that the getstats function is useful for quick information so it can be kept, although it provides no added information, only convenience 2) getobjectaddress() and gettrace(address) functions seem redundant. All that is required is getobjecttraceback(), I think.

The use case of get_traces() + get_object_trace() is to retrieve the traceback of all alive Python objects for tools like Melia, Pympler or Heapy. The only motivation is performance.

I wrote a benchmark using 10^6 objects and... get_traces() x 1 + get_object_address() x N is 40% slower than calling get_object_traceback() x N. So get_object_traceback() is faster for this use case, especially if you don't want to the traceback of all objects, but only a few of them.

Charles-Francois already asked me to remove everything related to address, so let's remove two more functions:

remove get_object_address()
remove get_trace()
get_traces() returns a list
remove 'address' key type of Snapshot.group_by()
'traceback' key type of Snapshot.group_by() groups traces by traceback, instead of (address, traceback) => it is closer to what you suggested me privately (generate "top stats" but keep the whole traceback)

1) really, all that is required in terms of data is the traceback.gettraces() function. Further, it need not return addresses since they are not required for analysis. It is sufficient for it to return a list of (traceback, size, count) tuples. I understand that the getstats function is useful for quick information so it can be kept, although it provides no added information, only convenience

For the get_stats() question, the motivation is also performances. Let's try a benchmark on my laptop.

Test 1. With the Python test suite, 467,738 traces limited to 1 frame:

take a snapshot with traces (call get_traces()): 293 ms
write the snapshot on disk: 167 ms
load the snapshot from disk: 184 ms
group by filename using stats: 24 ms (754 different filenames)
group by line using stats: 28 ms (31827 different lines)
group by traceback using traces: 333 ms (31827 different tracebacks, the traceback is limited to 1 frame)

Test 2. With the Python test suite, 495,571 traces limited to 25 frame:

take a snapshot without traces (call get_stats()): 35 ms
take a snapshot with traces (call get_stats() and get_traces()): 532 ms
write the snapshot on disk: 565 ms
load the snapshot from disk: 739 ms
group by filename using stats: 25 ms (906 different filenames)
group by line using stats: 22 ms (36940 different lines)
group by traceback using traces: 786 ms (66314 different tracebacks)

Test 3. tracemalloc modified to not use get_stats() anymore, only use traces. With the Python test suite, 884719 traces limited to 1 frame:

take a snapshot with traces (call get_traces()): 531 ms
write the snapshot on disk: 278 ms
load the snapshot from disk: 298 ms
group by filename using traces: 706 ms (1329 different filenames)
group by line using traces: 724 ms (55,349 different lines)
group by traceback using traces: 731 ms (55,349 different tracebacks, the traceback is limited to 1 frame)

I'm surprised: it's faster than the benchmark I ran some weeks ago. Maybe I optimized something? The most critical operation, taking a snapshot takes half a second, so it's enough efficient.

Let's remove even more code:

remove get_stats()
remove Snapshot.stats

Snapshot.group_by() can easily recompute statistics by filename and line number from traces.

(To be honest, get_stats() and get_traces() used together have an issue: they may be inconsistent if some objects are allocated between. Snapshot.apply_filters() has to apply filters on traces and stats for example. It's simpler to only manipulate traces.)

3) settracebacklimit(). Truncating tracebacks is bad. Particularly if it is truncated at the top end of the callstack, because then information looses cohesion, namely, the common connection point, the root. If traceback limits are required, I suggest being able to specifiy that we truncate the leaf-end of the tracebacks.

If the traceback is truncated and 90% of all memory is allocated at the same Python line: I prefer to have the get the most recent frame, than the n-th function from main() which may indirectly call 100 different more functions... In this case, how do you guess which function allocated the memory? You get the same issue than Melia/Pympler/Heapy: debug data doesn't help to identify the memory leak.

4) addfilter(). This is unnecessary. Information can be filtered on the python side. Defining Filter as a C type is not necessary. Similarly, module level filter functions can be dropped.

Filters for capture are here for efficiency: attaching a trace to each memory block is expensive. I tried pybench: when using tracemalloc, Python is 2x slower. The memory usage is also doubled. Using filters, the overhead is lower. I don't have numbers for the CPU, but for the memory: ignored traces are not stored, so the memory usage is immediatly reduced. Without filters for capture, I'm not sure that it is even possible to use tracemalloc with 100 frames on a large application.

Anyway, you can remove all filters: in this case, the overhead of filters is zero.

5) Filter, Snapshot, GroupedStats, Statistics: These classes, if required, can be implemented in a .py module.

Snapshot, GroupedStats and Statistics are implemented in Python.

Filter is implemented in C because I want filters for the capture.

6) Snapshot dump/load(): It is unusual to see load and save functions taking filenames in a python module, and a module implementing its own file IO. I have suggested simply to add Pickle support. Alternatively, support file-like objects or bytes (loads/dumps)

In the latest implementation, load/dump is trivial:

def dump(self, filename):
    with open(filename, "wb") as fp:
        pickle.dump(self, fp, pickle.HIGHEST_PROTOCOL)

@staticmethod
def load(filename, traces=True):
    with open(filename, "rb") as fp:
        return pickle.load(fp)

http://hg.python.org/features/tracemalloc/file/85c0cefb92cb/Lib/tracemalloc.py#l164

So you can easily reimplement your own serialization function (using pickle) with your custom file-like object.

I already asked Charles-Francois if he prefers to accept a file-like object as input (in filename, as open() does), but he doesn't feel the need.

I'd also like to point out (just to say "I told you so" :) ) that this module is precisely the reason I suggested we include "const char *file, int lineno" in the API for PEP 445, because that would allow us, in debug builds, to get one extra stack level, namely the position of the actual C allocation in the python source.

In my experience, C functions allocating memory are wrapped in Python objects, it's easy to guess the C function from the Python traceback.

Victor

Previous message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Next message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list