[Python-Dev] Updated PEP 454 (tracemalloc): no more metrics! (original) (raw)

Kristján Valur Jónsson kristjan at ccpgames.com
Thu Oct 24 14:34:13 CEST 2013

Previous message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Next message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

-----Original Message----- From: Victor Stinner [mailto:victor.stinner at gmail.com] Sent: 24. október 2013 01:03 To: Kristján Valur Jónsson Cc: Python Dev Subject: Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

The use case of gettraces() + getobjecttrace() is to retrieve the traceback of all alive Python objects for tools like Melia, Pympler or Heapy. The only motivation is performance. Well, for me, the use of get_traces() is to get the raw data so that I can perform my own analysis on it. With this data, I foresee people wanting to try to analyse this data in novel ways, as I suggested to you privately.

I wrote a benchmark using 10^6 objects and... gettraces() x 1 + getobjectaddress() x N is 40% slower than calling getobjecttraceback() x N. So getobjecttraceback() is faster for this use case, especially if you don't want to the traceback of all objects, but only a few of them.

I understand your desire for things to be fast, but let me just re-iterate my view that for this kind of jobs, performance is completely secondary. Memory debugging and analysis is an off-line, laboratory task. In my opinion, performance should not be driving the design of a module like this. And in particular, it should not be the only reason to write code in C that could just as well be written in .py. This is a lorry. A lorry is for moving refrigerators, on those rare occasions when you need to have refrigerators moved. It doesn't need go-faster-stripes.

Well, I think I've made my point on this amply clear now, in this email and the previous, so I won't dwell on it further.

Charles-Francois already asked me to remove everything related to address, so let's remove two more functions: Great.

Test 1. With the Python test suite, 467,738 traces limited to 1 frame: ... I'm surprised: it's faster than the benchmark I ran some weeks ago. Maybe I optimized something? The most critical operation, taking a snapshot takes half a second, so it's enough efficient.

Well, to me anything that happens in under a second is fast :)

Let's remove even more code: - remove getstats() - remove Snapshot.stats Removal of code is always nice :)

> 3) settracebacklimit(). Truncating tracebacks is bad. Particularly if it is truncated at the top end of the callstack, because then information looses cohesion, namely, the common connection point, the root. If traceback limits are required, I suggest being able to specifiy that we truncate the leaf- end of the tracebacks. If the traceback is truncated and 90% of all memory is allocated at the same Python line: I prefer to have the get the most recent frame, than the n-th function from main() which may indirectly call 100 different more functions... In this case, how do you guess which function allocated the memory? You get the same issue than Melia/Pympler/Heapy: debug data doesn't help to identify the memory leak.

Debugging memory leaks is not the only use case for your module. Analysing memory usage in a non-leaking application is also very important. In my work, I have been asked to reduce the memory overhead of a python application once it has started up. To do this, you need a top-down view of the application. You need to break it down from the "main" call down towards the leaves. Now, I would personally not truncate the stack, because I can afford the memory, but even if I would, for example, to hide a bunch of detail, I would want to throw away the lower detals of the stack. It is unimportant to me to know if memory was allocated in ...;itertools.py;logging.py;stringutil.py but more important to know that it was allocated in main.py;databaseengine.py;enginesettings.py;...

The "main" function here is the one that ties all the different allocations into one tree. If you take a tree, say a nice rowan, and truncate it by leaving only X nodes towards the leaves, you end up with a big heap of small branches. If on the other hand, you trim it so that you leave X nodes beginning at the root, you still have something resembling a tree, albeit a much coarser one.

Anyway, this is not so important. I would run this with full traceback myself and truncate the tracebacks during the display stage anyway.

> 4) addfilter(). This is unnecessary. Information can be filtered on the python side. Defining Filter as a C type is not necessary. Similarly, module level filter functions can be dropped. Filters for capture are here for efficiency: attaching a trace to each memory block is expensive. I tried pybench: when using tracemalloc, Python is 2x slower. The memory usage is also doubled. Using filters, the overhead is lower. I don't have numbers for the CPU, but for the memory: ignored traces are not stored, so the memory usage is immediatly reduced. Without filters for capture, I'm not sure that it is even possible to use tracemalloc with 100 frames on a large application. Anyway, you can remove all filters: in this case, the overhead of filters is zero.

> 6) Snapshot dump/load(): It is unusual to see load and save functions > taking filenames in a python module, and a module implementing its own > file IO. I have suggested simply to add Pickle support. > Alternatively, support file-like objects or bytes (loads/dumps) In the latest implementation, load/dump is trivial: def dump(self, filename): with open(filename, "wb") as fp: pickle.dump(self, fp, pickle.HIGHESTPROTOCOL) @staticmethod def load(filename, traces=True): with open(filename, "rb") as fp: return pickle.load(fp) What does the "traces" argument do in the load() function then?

Anyway, in this case, dump and load can be thought of as convenience functions. That's perfectly fine from my viewpoint.

> I'd also like to point out (just to say "I told you so" :) ) that this module is precisely the reason I suggested we include "const char *file, int lineno" in the API for PEP 445, because that would allow us, in debug builds, to get one extra stack level, namely the position of the actual C allocation in the python source. In my experience, C functions allocating memory are wrapped in Python objects, it's easy to guess the C function from the Python traceback.

Often, yes. But there are big black boxes that remain. The most numerous of those are those big mysterious allocations that can happen as a result of "import mymodule"

But apart from that, a lot of code can have unforeseen side effects, like growing some internal list, or other. This sort of information helps with understanding that.

Not that we are likely to change PEP 445 at this stage, but this was the use case for my suggestion.

Cheers,

Kristján

Previous message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Next message: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list