bpo-40170: Add _PyObject_CheckBuffer() internal function by shihai1991 · Pull Request #19541 · python/cpython (original) (raw)
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
@vstinner Hi, victor. Looks like _PyObject_CheckBuffer()
should be added, right?
@vstinner Hi, victor. Looks like _PyObject_CheckBuffer() should be added, right?
I chose to not add an internal _PyObject_CheckBuffer() on purpose, to simplify the implementation. I don't think that it's worth it.
@methane, @pablogsal: What do you think?
This change is related to my commit ef5c615 which converted the PyObject_CheckBuffer() macro to a function.
I feel this API is not performance-critical. So I agree with @vstinner.
Would you provide a benchmark if you think this API is important?
I feel this API is not performance-critical. So I agree with @vstinner.
Would you provide a benchmark if you think this API is important?
Oh, I was unware victor's intension before :(
I run $ ./python -m pyperf timeit --compare-to python3.9d "bytearray(range(10))"
three times:
Mean +- std dev: [python3.9d] 1.84 us +- 0.29 us -> [python] 1.76 us +- 0.23 us: 1.04x faster (-4%)
Not significant!
Mean +- std dev: [python3.9d] 1.87 us +- 0.32 us -> [python] 1.82 us +- 0.18 us: 1.02x faster (-2%)
Not significant!
Mean +- std dev: [python3.9d] 1.96 us +- 0.35 us -> [python] 1.84 us +- 0.22 us: 1.06x faster (-6%)
little improvment ;(
little improvment ;(
Honestly, I don't think that it's worth it to bother with this micro-optimization. Calling PyObject_CheckBuffer() is likely to take less than 50 nanoseconds. I close the issue.
It's all about tradeoffs. It depends if a function is commonly used or not. Here I don't think that it's worth it o bother with inlining.
Moreover, using LTO, the compiler may be allowed to inline PyObject_CheckBuffer() anyway, especially when using -fno-semantic-interposition which is used by default in Clang. FYI We modified the Python package in Fedora to use -fno-semantic-interposition so GCC can inline function calls from libpython to libpython (Pyhon is built with --enable-shared on Fedora to get libpython).
little improvment ;(
Honestly, I don't think that it's worth it to bother with this micro-optimization. Calling PyObject_CheckBuffer() is likely to take less than 50 nanoseconds. I close the issue.
It's all about tradeoffs. It depends if a function is commonly used or not. Here I don't think that it's worth it o bother with inlining.
Moreover, using LTO, the compiler may be allowed to inline PyObject_CheckBuffer() anyway, especially when using -fno-semantic-interposition which is used by default in Clang. FYI We modified the Python package in Fedora to use -fno-semantic-interposition so GCC can inline function calls from libpython to libpython (Pyhon is built with --enable-shared on Fedora to get libpython).
Wow, thanks a million, victor. Learned much from your info.
Moreover, what is the D in "python3.9d"? Is it a debug build?
Yes, I use./configure --with-pydebug --with-trace-refs && make install
to install the master vision.
Yes, I use ./configure --with-pydebug --with-trace-refs && make install to install the master vision.
Please don't run benchmarks on a debug build: they contain many debug checks which are run at runtime.
Yes, I use ./configure --with-pydebug --with-trace-refs && make install to install the master vision.
Please don't run benchmarks on a debug build: they contain many debug checks which are run at runtime.
Copy that. Thanks for your guide.