Issue 9609: make cProfile multi-stack aware (original) (raw)
One of the problems with the profiling modules provided with Python is that they are not useful in the presence of multiple threads. This is because time spent in a different thread may be falsely attributed to some random place in a thread being profiled.
This patch helps fix that, by making the _lsprof.c module (the engine behind cProfile) multi-stack aware. At every entry into the profiler, a check is made to see which stack is in operation (by looking at the thread state). The previous stack is then paused and profiling commences on the new stack.
Time spent on other stacks is then subtracted from the measured time on each stack.
A complication arises because it is no longer possible to determine the recursion level of each function (or subcall instance) on each stack by looking at the function's entry alone. For this reason, it becomes necessary to walk the stack in cases where there are multiple stacks and multiple total recursions seen for the entries.
This patch has been successfully used, with a modifiaction for stackless python, in production at CCP (the modification uses the Tasklet ID rather than the TLS pointer as a key to the stack map).
To be useful, it is important that all threads in the process are set to use the same cProfile.Profiler() instance. Currently there is no easy way to do that and this patch doesn't attempt to fix that. But is is possible that an application designed for profiling would attach the profiler at each thread start point. (In the version of Stackless Python that this is used on, it is possible to enable tracing/profiling of all tasklets simultaneously)
Here is a new patch. When 'allthreads' is specified to cProfile.Profile.enable(), profling is enabled on all threads. The testsuite tests to see that all threads do indeed register, it does not attempt to validate the timings.
It turns it on for all threads (in all interpreter states). Maybe this is overdoing it. Interpreter states is something new to me, a py3k feature?
Now, the problem with this approach is that it only works with threads already in existence when called. The "global" appcoach that we used to solve a similar problem in Stackless Python, however, will invoke the tracing/profiling functionality for all tasklets, even if they were new. The same would be useful for threads in regular python, especially in long running server applications.
However, this patch stands on its own. This particular shortcoming could be fixed later.