gh-96143: Allow Linux perf profiler to see Python calls by pablogsal · Pull Request #96123 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation173 Commits50 Checks0 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Jongy reacted with thumbs up emoji tiran, adamchainz, kumaraditya303, jjerphan, erlend-aasland, dgiger42, holmanb, maxbrunet, dalehamel, GalaxySnail, and 2 more reacted with hooray emoji
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea if that assembly code does what you say, but I've reviewed the rest.
Definitely an intriguing idea.
You can use preprocesor macros if you name the file Objects/asm_trampoline.S
or .sx
instead of Objects/asm_trampoline.s
. The .sx
form needs a makefile rule.
You can use preprocesor macros if you name the file
Objects/asm_trampoline.S
or.sx
instead ofObjects/asm_trampoline.s
. The.sx
form needs a makefile rule.
Hummm, not sure I follow, could you maybe show me an example of what we can achieve with this?
You can have multiple implementations in the same file:
.text
.globl _Py_trampoline_func_start
_Py_trampoline_func_start:
#ifdef __x86_64__
push %rbp
mov %rsp,%rbp
mov %rdi,%rax
mov %rsi,%rdi
mov %rdx,%rsi
mov %ecx,%edx
call *%rax
pop %rbp
ret
#endif // __x86_64__
#ifdef __aarch64__
TODO
#endif
.globl _Py_trampoline_func_end
_Py_trampoline_func_end:
pablogsal changed the title
Allow Linux perf profiler to see Python calls gh-96143: Allow Linux perf profiler to see Python calls
Why did you add a Windows build file? How about we do not define _PyPerfTrampoline_Init
unless HAVE_PERF_TRAMPOLINE is defined?
Why did you add a Windows build file? How about we do not define
_PyPerfTrampoline_Init
unless HAVE_PERF_TRAMPOLINE is defined?
Then we need to add more ifdef every place is used but that works as well for sure
they now match the current code.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts have been running around in my mind:
(1) We've got the command line flag and sys API, but what about just enabling this system wide or at least task/job wide within a container? An environment variable that turns it on would make that easy without plumbing options through. PYTHONPERFSUPPORT=1
as another equivalent to -Xperf
? That's usually done in initconfig.c alongside the -X processing I believe.
Second: multiprocessing spawn start method. If the -Xperf
flag was passed to the parent process I assume the "spawn" method should also pass that to its children.
Two thoughts have been running around in my mind:
(1) We've got the command line flag and sys API, but what about just enabling this system wide or at least task/job wide within a container? An environment variable that turns it on would make that easy without plumbing options through.
PYTHONPERFSUPPORT=1
as another equivalent to-Xperf
? That's usually done in initconfig.c alongside the -X processing I believe.Second: multiprocessing spawn start method. If the
-Xperf
flag was passed to the parent process I assume the "spawn" method should also pass that to its children.
I think (1) makes sense and is indeed coherent with how we handle some of the other options like tracemalloc
so I will add an environment variable alongside the -X option.
Regarding (2) I am not that sure. We don't do this for the rest of the flags that we pass around and you can achieve the same with the environment variable if you want. In any case, as that would require some tests I would prefer to do that in a separate PR as this is already gigantic
I implemented the environment variable and solved a bunch of conflicts. Please check it out when you have time.
I also had to solve a bunch of conflicts. @gpshead if you are ok with the current status I would like us to land if everything looks good as the size of the PR is already attracting a bunch of merge conflicts in the build system, clinic and other files.
Ah wait, I need to document the environment variable. Will push a commit for that soon.
Ah wait, I need to document the environment variable. Will push a commit for that soon.
Done 👍
@erlend-aasland has mentioned that he was a bunch of docs improvements that he will do in a separate PR.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments are easy to address things that don't need further review - minor edits or added note/todo comments to our future selves.
Agreed that the docs can use some polishing up but all the important bits are there to seed that future work, thanks for writing them!
Thanks for taking on adding this feature! I expect we'll see how a backport fares internally.
Damn, @miss-islington has landed the PR without waiting for the CI to build the last commit I pushed so I created #96433.
Thanks, everyone for the fantastic reviews and for helping to get this feature ready ❤️
You all rock 🤘
Thanks for all for pushing for better interoperability with perf(1)
. I highly appreciate this contribution. 🙏
Nice feature!
There is a typo in sys.deactivate_stack_trampoline() docstring: "Dectivate the perf profiler trampoline": Deactivate.
In the doc, python -m sysconfig | grep HAVE_PERF_TRAMPOLINE
can be replaced with python -c "import sysconfig; print(bool(sysconfig.get_config_var('PY_HAVE_PERF_TRAMPOLINE')))"
to not rely on the external grep
command. At least, I suggest to add PY_ to fix the variable name:
python -m sysconfig | grep PY_HAVE_PERF_TRAMPOLINE
The second command can be replaced with python -c "import sysconfig; print('no-omit-frame-pointer' in sysconfig.get_config_var('PY_CORE_CFLAGS'))"
.
There is a typo in sys.deactivate_stack_trampoline() docstring: "Dectivate the perf profiler trampoline": Deactivate.
[...]
See #96445 :)
Is it possible to port these changes, from python 3.12, without breaking to previous python versions, say 3.11, 3.10, 3.9?
Why?
Some libraries use specific versions of python and it will take time until they catch up to 3.12's release. This will defer the insights that can be gained till later on.
Is it possible to port these changes, from python 3.12, without breaking to previous python versions, say 3.11, 3.10, 3.9?
Is possible, but as these versions are only accepting security fixes or bugfixes we cannot backport new features. This means that if you want you can backport them yourself but you need to maintain a fork of the interpreter.
facebook-github-bot pushed a commit to facebookincubator/cinder that referenced this pull request
Summary: Backport the perf-trampoline introduced in python/cpython#96123. The perf trampoline doesn't work properly with the JIT, so we have submitted a PR to have a C-API to unify writing to the perf-map files python/cpython#103546.
Reviewed By: czardoz
Differential Revision: D45419843
fbshipit-source-id: 16bd13d7981e48c9eb7bc0e5eef1c1f4748965f6
Reviewers
gpshead gpshead approved these changes
tiran tiran approved these changes
zooba zooba left review comments
tomytsai1 tomytsai1 left review comments
chalggg chalggg left review comments
kumaraditya303 kumaraditya303 left review comments
markshannon Awaiting requested review from markshannon