[Python-ideas] Sampling Profiler for Python (original) (raw)

Torsten Landschoff t.landschoff at gmx.net
Wed Jan 4 23:32:04 CET 2012


Hello world,

since I moved from Java development to Python development, I really miss the sampling profiler of jvisualvm (http://visualvm.java.net/). Perhaps it is only me but especially for long running GUI applications, I would really like to know why it sometimes slows down to a crawl.

cProfile is not really useful for this as it makes the software unresponsive alone. Also if using multiple threads (like doing some I/O in background, when using http://www.lag.net/paramiko/) it will give meaningless results.

I looked for an existing profiler and found statprof, mentioned in this blog: http://wingolog.org/archives/2005/10/28/profiling

However, this still fails profiling multiple threads. I don't think one can do any better without touching the interpreter core. So that's what I tried...

My approach implements the profiler from the following parts:

  1. Sample Trigger: A timer that triggers taking samples of all threads at fixed intervals.
  2. Sampling function injection: Ensures that a sampling function is called on each thread at the first opportunity after the sample trigger fired.
  3. Sampling function: A function that assigns the current time slice to the executing code on the thread it is called on.

Only the second part has to be implemented in the Python interpreter core, the sample trigger and sampling functions can be provided by external modules. My implementation of (2) can be found at https://bitbucket.org/bluehorn/sampling_prof

I also prepared a first implementation of (1) and (3) at https://bitbucket.org/bluehorn/profiler_util. Using that library, I can profile pybench with a 3 % slowdown (taking 100 profiling samples per second):

$ export PYTHON_PATH=$HOME/python-work/profiler_util $ python3 -m sampling_profiler -- Tools/pybench/pybench.py -c stock.pybench -f prof.pybench [...] Totals: 3119ms 3072ms +1.5% 3198ms
3108ms +2.9%

This also shows that the string benchmark takes the most time:

Thread MainThread (3842 samples, 38.434401 seconds)

cpu_time (cum) ticks (cum) samples (cum) filename:lineno function 0.548 0.548 55 55 55 55 Tools/pybench/Strings.py:455 test 0.448 0.448 45 45 45 45 Tools/pybench/Strings.py:568 calibrate 0.428 0.428 43 43 43 43 Tools/pybench/Constructs.py:484 test 0.380 0.380 38 38 38 38 Tools/pybench/Calls.py:445 f 0.324 0.324 34 34 34 34 Tools/pybench/Calls.py:122 f

The output is explained as follows:

cpu_time: cpu time in seconds accounted to that line. (cum): This is the cpu time including the cumulative time of the functions called on that line. Gathering that data means that the sampling function has to walk the stack for each sample, which will cause a dramatic slowdown for deeply recursive calls. There should be a switch to disable this. ticks: Number of sampling profiler ticks accounted to that line. For my example (where sample ticks are equidistant in wall time) this is a measure of the time spent in that line, including any blocking calls (especially I/O). (cum): ... including called functions samples: Number of times the sampling profiler function was called for that line. This should equal ticks unless execution of that line took longer than a profiler tick.

The remaining output should be obvious.

I have to admit that this is far from finished, but I hope to get some feedback and perhaps even help by posting it here.

What do you think?

Greetings, Torsten

-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-ideas/attachments/20120104/664a8811/attachment.html>



More information about the Python-ideas mailing list