[Python-Dev] PEP 454 (tracemalloc): new minimalist version (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Sat Oct 19 02:49:49 CEST 2013


On 19 Oct 2013 03:57, "Charles-François Natali" <cf.natali at gmail.com> wrote:

Hi, I'm happy to see this move forward!

Speaking of which... Charles-François, would you be willing to act as BDFL-Delegate for this PEP? This will be a very useful new analysis tool, and between yourself and Victor it looks like you'll be able to come up with a solid API.

I just suggested that approach to Guido and he also liked the idea :)

Cheers, Nick.

> API > === > > Main Functions > -------------- > > cleartraces() function: > > Clear traces and statistics on Python memory allocations, and reset > the gettracedmemory() counter. That's nitpicking, but how about just reset() (I'm probably biased by oprofile's opcontrol --reset)? > getstats() function: > > Get statistics on traced Python memory blocks as a dictionary > {filename (str): {linenumber (int): stats}} where stats in a > (size: int, count: int) tuple, filename and linenumber can > be None. It's probably obvious, but you might want to say once what size and count represent (and the unit for size). > gettracemallocmemory() function: > > Get the memory usage in bytes of the tracemalloc module as a > tuple: (size: int, free: int). > > * size: total size of bytes allocated by the module, > including free bytes > * free: number of free bytes available to store data What's free exactly? I assume it's linked to the internal storage area used by tracemalloc itself, but that's not clear at all. Also, is the tracemalloc overhead included in the above stats (I'm mainly thinking about getstats() and gettracedmemory()? If yes, I find it somewhat confusing: for example, AFAICT, valgrind's memcheck doesn't report the memory overhead, although it can be quite large, simply because it's not interesting. > Trace Functions > --------------- > > gettracebacklimit() function: > > Get the maximum number of frames stored in the traceback of a trace > of a memory block. > > Use the settracebacklimit() function to change the limit. I didn't see anywhere the default value for this setting: it would be nice to write it somewhere, and also explain the rationale (memory/CPU overhead...). > getobjectaddress(obj) function: > > Get the address of the main memory block of the specified Python object. > > A Python object can be composed by multiple memory blocks, the > function only returns the address of the main memory block. IOW, this should return the same as id() on CPython? If yes, it could be an interesting note. > getobjecttrace(obj) function: > > Get the trace of a Python object obj as a ``(size: int, > traceback)tuple where *traceback* is a tuple of(filename: str, > lineno: int)tuples, *filename* and *lineno* can beNone``. I find the "trace" word confusing, so it might be interesting to add a note somewhere explaining what it is ("callstack leading to the object allocation", or whatever). Also, this function leaves me a mixed feeling: it's called getobjecttrace(), but you also return the object size - well, a vague estimate thereof. I wonder if the size really belongs here, especially if the information returned isn't really accurate: it will be for an integer, but not for e.g. a list, right? How about just using sys.getsizeof(), which would give a more accurate result? > gettrace(address) function: > > Get the trace of a memory block as a (size: int, traceback) > tuple where traceback is a tuple of ``(filename: str, lineno: > int)tuples, *filename* and *lineno* can beNone``. > > Return None if the tracemalloc module did not trace the > allocation of the memory block. > > See also getobjecttrace(), getstats() and > gettraces() functions. Do you have example use cases where you want to work with a raw addresses? > Filter > ------ > > ``Filter(include: bool, pattern: str, lineno: int=None, traceback: > bool=False)`` class: > > Filter to select which memory allocations are traced. Filters can be > used to reduce the memory usage of the tracemalloc module, which > can be read using the gettracemallocmemory() function. > > match(filename: str, lineno: int) method: > > Return True if the filter matchs the filename and line number, > False otherwise. > > matchfilename(filename: str) method: > > Return True if the filter matchs the filename, False otherwise. > > matchlineno(lineno: int) method: > > Return True if the filter matchs the line number, False > otherwise. > > matchtraceback(traceback) method: > > Return True if the filter matchs the traceback, False > otherwise. > > traceback is a tuple of (filename: str, lineno: int) tuples. Are those match methods really necessary for the end user, i.e. are they worth being exposed as part of the public API? > StatsDiff > --------- > > StatsDiff(differences, oldstats, newstats) class: > > Differences between two GroupedStats instances. > > The GroupedStats.compareto() method creates a StatsDiff > instance. > > sort() method: > > Sort the differences list from the biggest difference to the > smallest difference. Sort by abs(sizediff), size, > abs(countdiff), count and then by key. > > differences attribute: > > Differences between oldstats and newstats as a list of > (sizediff, size, countdiff, count, key) tuples. sizediff, > size, countdiff and count are int. The key type depends > on the GroupedStats.groupby attribute of newstats: see the > Snapshot.topby() method. > > oldstats attribute: > > Old GroupedStats instance, can be None. > > newstats attribute: > > New GroupedStats instance. Why keep references to oldstats and newstats? datetime.timedelta doesn't keep references to the date objects it was computed from. Also, if you sort the difference by default (which is a sensible choice), then the StatsDiff becomes pretty much useless, since you would just keep its differences attribute (sorted). > Snapshot > -------- > > ``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats: > dict=None)`` class: > > Snapshot of traces and statistics on memory blocks allocated by Python.

I'm confused. Why are gettrace(), getobjecttrace(), getstats() etc not methods of a Snapshot object? Is it because you don't store all the necessary information in a snapshot, or are they just some sort of shorthands, like: stats = getstats() vs snapshot = Snapshot.create() stats = snapshot.stats > write(filename) method: > > Write the snapshot into a file. I assume it's in a serialized form, only readable by Snapshort.load() ? BTW, it's a nitpick and debatable, but write()/read() or load()/dump() would be more consistent (see e.g. pickle's load/dump). > Metric > ------ > > Metric(name: str, value: int, format: str) class: > > Value of a metric when a snapshot is created. Alright, what's a metric again ;-) ? I don't know if it's customary, but having short examples would IMO be nice. cf


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20131019/4dbabe34/attachment-0001.html>



More information about the Python-Dev mailing list