[Python-Dev] RFC: PEP 454: Add a new tracemalloc module (original) (raw)

Victor Stinner victor.stinner at gmail.com
Wed Sep 4 01:27:15 CEST 2013


Hi,

Antoine Pitrou suggested me to write a PEP to discuss the API of the new tracemalloc module that I proposed to add to Python 3.4. Here you have.

If you prefer to read the HTML version: http://www.python.org/dev/peps/pep-0454/

See also the documentation of the current implementation of the module. http://hg.python.org/features/tracemalloc/file/tip/Doc/library/tracemalloc.rst

The documentaion contains examples and a short "tutorial".

PEP: 454 Title: Add a new tracemalloc module to trace Python memory allocations Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Victor Stinner <victor.stinner at gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 3-September-2013 Python-Version: 3.4

Abstract

Add a new tracemalloc module to trace Python memory allocations.

Rationale

Common debug tools tracing memory allocations read the C filename and number. Using such tool to analyze Python memory allocations does not help because most memory allocations are done in the same C function, PyMem_Malloc() for example.

There are debug tools dedicated to the Python languages like Heapy and PySizer. These projects analyze objects type and/or content. These tools are useful when the most memory leak are instances of the same type and this type in allocated only in a few functions. The problem is when the object type is very common like str or tuple, and it is hard to identify where these objects are allocated.

Finding reference cycles is also a difficult task. There are different tools to draw a diagram of all references. These tools cannot be used huge on large applications with thousands of objects because the diagram is too huge to be analyzed manually.

Proposal

Using the PEP 445, it becomes easy to setup an hook on Python memory allocators. The hook can inspect the current Python frame to get the Python filename and line number.

This PEP proposes to add a new tracemalloc module. It is a debug tool to trace memory allocations made by Python. The module provides the following information:

Command line options

The python -m tracemalloc command can be used to analyze and compare snapshots. The command takes a list of snapshot filenames and has the following options.

-g, --group-per-file

Group allocations per filename, instead of grouping per line number.

-n NTRACES, --number NTRACES

Number of traces displayed per top (default: 10).

--first

Compare with the first snapshot, instead of comparing with the
previous snapshot.

--include PATTERN

Only include filenames matching pattern *PATTERN*. The option can be
specified multiple times.

See ``fnmatch.fnmatch()`` for the syntax of patterns.

--exclude PATTERN

Exclude filenames matching pattern *PATTERN*. The option can be
specified multiple times.

See ``fnmatch.fnmatch()`` for the syntax of patterns.

-S, --hide-size

Hide the size of allocations.

-C, --hide-count

Hide the number of allocations.

-A, --hide-average

Hide the average size of allocations.

-P PARTS, --filename-parts=PARTS

Number of displayed filename parts (default: 3).

--color

Force usage of colors even if ``sys.stdout`` is not a TTY device.

--no-color

Disable colors if ``sys.stdout`` is a TTY device.

API

To trace the most Python memory allocations, the module should be enabled as early as possible in your application by calling tracemalloc.enable() function, by setting the PYTHONTRACEMALLOC environment variable to 1, or by using -X tracemalloc command line option.

Functions

enable() function:

Start tracing Python memory allocations.

disable() function:

Stop tracing Python memory allocations and stop the timer started by
``start_timer()``.

is_enabled() function:

Get the status of the module: ``True`` if it is enabled, ``False``
otherwise.

get_object_address(obj) function:

Get the address of the memory block of the specified Python object.

get_object_trace(obj) function:

Get the trace of a Python object *obj* as a ``trace`` instance.

Return ``None`` if the tracemalloc module did not save the location
when the object was allocated, for example if the module was
disabled.

get_process_memory() function:

Get the memory usage of the current process as a meminfo namedtuple
with two attributes:

* ``rss``: Resident Set Size in bytes
* ``vms``: size of the virtual memory in bytes

Return ``None`` if the platform is not supported.

Use the ``psutil`` module if available.

get_stats() function:

Get statistics on Python memory allocations per Python filename and
per Python line number.

Return a dictionary
``{filename: str -> {line_number: int -> stats: line_stat}}``
where *stats* in a ``line_stat`` instance. *filename* and
*line_number* can be ``None``.

Return an empty dictionary if the tracemalloc module is disabled.

get_traces(obj) function:

Get all traces of a Python memory allocations. Return a dictionary {pointer: int -> trace} where trace is a trace instance.

Return an empty dictionary if the tracemalloc module is disabled.

start_timer(delay: int, func: callable, args: tuple=(), kwargs: dict={}) function:

Start a timer calling ``func(*args, **kwargs)`` every *delay*
seconds.

The timer is based on the Python memory allocator, it is not real
time.  *func* is called after at least *delay* seconds, it is not
called exactly after *delay* seconds if no Python memory allocation
occurred.

If ``start_timer()`` is called twice, previous parameters are
replaced.  The timer has a resolution of 1 second.

``start_timer()`` is used by ``DisplayTop`` and ``TakeSnapshot`` to
run regulary a task.

stop_timer() function:

Stop the timer started by ``start_timer()``.

trace class

trace class:

This class represents debug information of an allocated memory block.

size attribute:

Size in bytes of the memory block.

filename attribute:

Name of the Python script where the memory block was allocated,
``None`` if unknown.

lineno attribute:

Line number where the memory block was allocated, ``None`` if
unknown.

line_stat class

line_stat class:

Statistics on Python memory allocations of a specific line number.

size attribute:

Total size in bytes of all memory blocks allocated on the line.

count attribute:

Number of memory blocks allocated on the line.

DisplayTop class

DisplayTop(count: int=10, file=sys.stdout) class:

Display the list of the *count* biggest memory allocations into
*file*.

display() method:

Display the top once.

start(delay: int) method:

Start a task using ``tracemalloc`` timer to display the top every
*delay* seconds.

stop() method:

Stop the task started by the ``DisplayTop.start()`` method

color attribute:

If ``True``, ``display()`` uses color. The default value is
``file.isatty()``.

compare_with_previous attribute:

If ``True`` (default value), ``display()`` compares with the
previous snapshot. If ``False``, compare with the first snapshot.

filename_parts attribute:

Number of displayed filename parts (int, default: ``3``). Extra
parts are replaced with ``"..."``.

group_per_file attribute:

If ``True``, group memory allocations per Python filename. If
``False`` (default value), group allocation per Python line number.

show_average attribute:

  If ``True`` (default value), ``display()`` shows the average size
  of allocations.

show_count attribute:

If ``True`` (default value), ``display()`` shows the number of
allocations.

show_size attribute:

If ``True`` (default value), ``display()`` shows the size of
allocations.

user_data_callback attribute:

Optional callback collecting user data (callable, default:
``None``).  See ``Snapshot.create()``.

Snapshot class

Snapshot() class:

Snapshot of Python memory allocations.

Use ``TakeSnapshot`` to take regulary snapshots.

create(user_data_callback=None) method:

Take a snapshot. If *user_data_callback* is specified, it must be a
callable object returning a list of
``(title: str, format: str, value: int)``.
*format* must be ``'size'``. The list must always have the same
length and the same order to be able to compute differences between
values.

Example: ``[('Video memory', 'size', 234902)]``.

filter_filenames(patterns: list, include: bool) method:

Remove filenames not matching any pattern of *patterns* if *include*
is ``True``, or remove filenames matching a pattern of *patterns* if
*include* is ``False`` (exclude).

See ``fnmatch.fnmatch()`` for the syntax of a pattern.

load(filename) classmethod:

Load a snapshot from a file.

write(filename) method:

Write the snapshot into a file.

pid attribute:

Identifier of the process which created the snapshot (int).

process_memory attribute:

Result of the ``get_process_memory()`` function, can be ``None``.

stats attribute:

Result of the ``get_stats()`` function (dict).

timestamp attribute:

Creation date and time of the snapshot, ``datetime.datetime``
instance.

user_data attribute:

Optional list of user data, result of *user_data_callback* in
``Snapshot.create()`` (default: None).

TakeSnapshot class

TakeSnapshot class:

Task taking snapshots of Python memory allocations: write them into
files. By default, snapshots are written in the current directory.

start(delay: int) method:

Start a task taking a snapshot every delay seconds.

stop() method:

Stop the task started by the ``TakeSnapshot.start()`` method.

take_snapshot() method:

Take a snapshot.

filename_template attribute:

Template (``str``) used to create a filename. The following
variables can be used in the template:

* ``$pid``: identifier of the current process
* ``$timestamp``: current date and time
* ``$counter``: counter starting at 1 and incremented at each snapshot

The default pattern is ``'tracemalloc-$counter.pickle'``.

user_data_callback attribute:

Optional callback collecting user data (callable, default:
``None``).  See ``Snapshot.create()``.

Links

Python issues:

Similar projects:

Copyright

This document has been placed into the public domain.



More information about the Python-Dev mailing list