[Python-Dev] PEP 454 (tracemalloc): new minimalist version (original) (raw)
Charles-François Natali cf.natali at gmail.com
Fri Oct 18 19:56:30 CEST 2013
- Previous message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
- Next message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
I'm happy to see this move forward!
API ===
Main Functions --------------
cleartraces()
function: Clear traces and statistics on Python memory allocations, and reset thegettracedmemory()
counter.
That's nitpicking, but how about just reset()
(I'm probably biased
by oprofile's opcontrol --reset)?
getstats()
function:Get statistics on traced Python memory blocks as a dictionary
{filename (str): {linenumber (int): stats}}
where stats in a(size: int, count: int)
tuple, filename and linenumber can beNone
.
It's probably obvious, but you might want to say once what size and count represent (and the unit for size).
gettracemallocmemory()
function:Get the memory usage in bytes of the
tracemalloc
module as a tuple:(size: int, free: int)
. * size: total size of bytes allocated by the module, including free bytes * free: number of free bytes available to store data
What's free exactly? I assume it's linked to the internal storage area used by tracemalloc itself, but that's not clear at all.
Also, is the tracemalloc overhead included in the above stats (I'm mainly thinking about get_stats() and get_traced_memory()? If yes, I find it somewhat confusing: for example, AFAICT, valgrind's memcheck doesn't report the memory overhead, although it can be quite large, simply because it's not interesting.
Trace Functions ---------------
gettracebacklimit()
function: Get the maximum number of frames stored in the traceback of a trace of a memory block. Use thesettracebacklimit()
function to change the limit.
I didn't see anywhere the default value for this setting: it would be nice to write it somewhere, and also explain the rationale (memory/CPU overhead...).
getobjectaddress(obj)
function:Get the address of the main memory block of the specified Python object. A Python object can be composed by multiple memory blocks, the function only returns the address of the main memory block.
IOW, this should return the same as id() on CPython? If yes, it could be an interesting note.
getobjecttrace(obj)
function:Get the trace of a Python object obj as a ``(size: int, traceback)
tuple where *traceback* is a tuple of
(filename: str, lineno: int)tuples, *filename* and *lineno* can be
None``.
I find the "trace" word confusing, so it might be interesting to add a note somewhere explaining what it is ("callstack leading to the object allocation", or whatever).
Also, this function leaves me a mixed feeling: it's called get_object_trace(), but you also return the object size - well, a vague estimate thereof. I wonder if the size really belongs here, especially if the information returned isn't really accurate: it will be for an integer, but not for e.g. a list, right? How about just using sys.getsizeof(), which would give a more accurate result?
gettrace(address)
function:Get the trace of a memory block as a
(size: int, traceback)
tuple where traceback is a tuple of ``(filename: str, lineno: int)tuples, *filename* and *lineno* can be
None``. ReturnNone
if thetracemalloc
module did not trace the allocation of the memory block. See alsogetobjecttrace()
,getstats()
andgettraces()
functions.
Do you have example use cases where you want to work with a raw addresses?
Filter ------
``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class: Filter to select which memory allocations are traced. Filters can be used to reduce the memory usage of the
tracemalloc
module, which can be read using thegettracemallocmemory()
function.match(filename: str, lineno: int)
method: ReturnTrue
if the filter matchs the filename and line number,False
otherwise.matchfilename(filename: str)
method: ReturnTrue
if the filter matchs the filename,False
otherwise.matchlineno(lineno: int)
method: ReturnTrue
if the filter matchs the line number,False
otherwise.matchtraceback(traceback)
method: ReturnTrue
if the filter matchs the traceback,False
otherwise. traceback is a tuple of(filename: str, lineno: int)
tuples.
Are those match
methods really necessary for the end user, i.e.
are they worth being exposed as part of the public API?
StatsDiff ---------
StatsDiff(differences, oldstats, newstats)
class: Differences between twoGroupedStats
instances. TheGroupedStats.compareto()
method creates aStatsDiff
instance.sort()
method: Sort thedifferences
list from the biggest difference to the smallest difference. Sort byabs(sizediff)
, size,abs(countdiff)
, count and then by key.differences
attribute: Differences betweenoldstats
andnewstats
as a list of(sizediff, size, countdiff, count, key)
tuples. sizediff, size, countdiff and count areint
. The key type depends on theGroupedStats.groupby
attribute ofnewstats
: see theSnapshot.topby()
method.oldstats
attribute: OldGroupedStats
instance, can beNone
.newstats
attribute: NewGroupedStats
instance.
Why keep references to old_stats
and new_stats
?
datetime.timedelta doesn't keep references to the date objects it was
computed from.
Also, if you sort the difference by default (which is a sensible
choice), then the StatsDiff becomes pretty much useless, since you
would just keep its differences
attribute (sorted).
Snapshot --------
``Snapshot(timestamp: datetime.datetime, traces: dict=None, stats: dict=None)`` class: Snapshot of traces and statistics on memory blocks allocated by Python.
I'm confused. Why are get_trace(), get_object_trace(), get_stats() etc not methods of a Snapshot object? Is it because you don't store all the necessary information in a snapshot, or are they just some sort of shorthands, like: stats = get_stats() vs snapshot = Snapshot.create() stats = snapshot.stats
write(filename)
method:Write the snapshot into a file.
I assume it's in a serialized form, only readable by Snapshort.load() ? BTW, it's a nitpick and debatable, but write()/read() or load()/dump() would be more consistent (see e.g. pickle's load/dump).
Metric ------
Metric(name: str, value: int, format: str)
class: Value of a metric when a snapshot is created.
Alright, what's a metric again ;-) ?
I don't know if it's customary, but having short examples would IMO be nice.
cf
- Previous message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
- Next message: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]