[Python-Dev] PEP 454 (tracemalloc): new minimalist version (original) (raw)

Charles-François Natali cf.natali at gmail.com
Fri Oct 18 19:56:30 CEST 2013

Previous message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
Next message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

I'm happy to see this move forward!

API ===

Main Functions -------------- cleartraces() function: Clear traces and statistics on Python memory allocations, and reset the gettracedmemory() counter.

That's nitpicking, but how about just reset() (I'm probably biased by oprofile's opcontrol --reset)?

getstats() function:

Get statistics on traced Python memory blocks as a dictionary {filename (str): {linenumber (int): stats}} where stats in a (size: int, count: int) tuple, filename and linenumber can be None.

It's probably obvious, but you might want to say once what size and count represent (and the unit for size).

gettracemallocmemory() function:

Get the memory usage in bytes of the tracemalloc module as a tuple: (size: int, free: int). * size: total size of bytes allocated by the module, including free bytes * free: number of free bytes available to store data

What's free exactly? I assume it's linked to the internal storage area used by tracemalloc itself, but that's not clear at all.

Also, is the tracemalloc overhead included in the above stats (I'm mainly thinking about get_stats() and get_traced_memory()? If yes, I find it somewhat confusing: for example, AFAICT, valgrind's memcheck doesn't report the memory overhead, although it can be quite large, simply because it's not interesting.

Trace Functions ---------------

gettracebacklimit() function: Get the maximum number of frames stored in the traceback of a trace of a memory block. Use the settracebacklimit() function to change the limit.

I didn't see anywhere the default value for this setting: it would be nice to write it somewhere, and also explain the rationale (memory/CPU overhead...).

getobjectaddress(obj) function:

Get the address of the main memory block of the specified Python object. A Python object can be composed by multiple memory blocks, the function only returns the address of the main memory block.

IOW, this should return the same as id() on CPython? If yes, it could be an interesting note.

getobjecttrace(obj) function:

Get the trace of a Python object obj as a ``(size: int, traceback)tuple where *traceback* is a tuple of(filename: str, lineno: int)tuples, *filename* and *lineno* can beNone``.

I find the "trace" word confusing, so it might be interesting to add a note somewhere explaining what it is ("callstack leading to the object allocation", or whatever).

Also, this function leaves me a mixed feeling: it's called get_object_trace(), but you also return the object size - well, a vague estimate thereof. I wonder if the size really belongs here, especially if the information returned isn't really accurate: it will be for an integer, but not for e.g. a list, right? How about just using sys.getsizeof(), which would give a more accurate result?

gettrace(address) function:

Get the trace of a memory block as a (size: int, traceback) tuple where traceback is a tuple of ``(filename: str, lineno: int)tuples, *filename* and *lineno* can beNone``. Return None if the tracemalloc module did not trace the allocation of the memory block. See also getobjecttrace(), getstats() and gettraces() functions.

Do you have example use cases where you want to work with a raw addresses?

Filter ------

``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class: Filter to select which memory allocations are traced. Filters can be used to reduce the memory usage of the tracemalloc module, which can be read using the gettracemallocmemory() function. match(filename: str, lineno: int) method: Return True if the filter matchs the filename and line number, False otherwise. matchfilename(filename: str) method: Return True if the filter matchs the filename, False otherwise. matchlineno(lineno: int) method: Return True if the filter matchs the line number, False otherwise. matchtraceback(traceback) method: Return True if the filter matchs the traceback, False otherwise. traceback is a tuple of (filename: str, lineno: int) tuples.

Are those match methods really necessary for the end user, i.e. are they worth being exposed as part of the public API?

StatsDiff ---------

StatsDiff(differences, oldstats, newstats) class: Differences between two GroupedStats instances. The GroupedStats.compareto() method creates a StatsDiff instance. sort() method: Sort the differences list from the biggest difference to the smallest difference. Sort by abs(sizediff), size, abs(countdiff), count and then by key. differences attribute: Differences between oldstats and newstats as a list of (sizediff, size, countdiff, count, key) tuples. sizediff, size, countdiff and count are int. The key type depends on the GroupedStats.groupby attribute of newstats: see the Snapshot.topby() method. oldstats attribute: Old GroupedStats instance, can be None. newstats attribute: New GroupedStats instance.

Why keep references to old_stats and new_stats? datetime.timedelta doesn't keep references to the date objects it was computed from.

Also, if you sort the difference by default (which is a sensible choice), then the StatsDiff becomes pretty much useless, since you would just keep its differences attribute (sorted).

Snapshot --------

``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats: dict=None)`` class: Snapshot of traces and statistics on memory blocks allocated by Python.

I'm confused. Why are get_trace(), get_object_trace(), get_stats() etc not methods of a Snapshot object? Is it because you don't store all the necessary information in a snapshot, or are they just some sort of shorthands, like: stats = get_stats() vs snapshot = Snapshot.create() stats = snapshot.stats

write(filename) method:

Write the snapshot into a file.

I assume it's in a serialized form, only readable by Snapshort.load() ? BTW, it's a nitpick and debatable, but write()/read() or load()/dump() would be more consistent (see e.g. pickle's load/dump).

Metric ------

Metric(name: str, value: int, format: str) class: Value of a metric when a snapshot is created.

Alright, what's a metric again ;-) ?

I don't know if it's customary, but having short examples would IMO be nice.

Previous message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
Next message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list