[Python-Dev] PEP 580 and PEP 590 comparison. (original) (raw)
Mark Shannon mark at hotpy.org
Sun Apr 14 07:34:17 EDT 2019
- Previous message (by thread): [Python-Dev] Fixing the ctypes implementation of the PEP3118 buffer interface
- Next message (by thread): [Python-Dev] PEP 580 and PEP 590 comparison.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Petr,
Thanks for spending time on this.
I think the comparison of the two PEPs falls into two broad categories, performance and capability.
I'll address capability first.
Let's try a thought experiment.
Consider PEP 580. It uses the old tp_print
slot as an offset to mark
the location of the CCall structure within the callable. Now suppose
instead that it uses a tp_flag
to mark the presence of an offset field
and that the offset field is moved to the end of the TypeObject. This
would not impact the capabilities of PEP 580.
Now add a single line
nargs ~= PY_VECTORCALL_ARGUMENTS_OFFSET
here
https://github.com/python/cpython/compare/master...jdemeyer:pep580#diff-1160d7c87cbab324fda44e7827b36cc9R570
which would make PyCCall_FastCall compatible with the PEP 590 vectorcall
protocol.
Now rebase the PEP 580 reference code on top of PEP 590 minimal
implementation and make the vectorcall field of CFunction point to
PyCCall_FastCall.
The resulting hybrid is both a PEP 590 conformant implementation, and is
at least as capable as the reference PEP 580 implementation.
Therefore PEP 590, must be at least as capable at PEP 580.
Now performance.
Currently the PEP 590 implementation is intentionally minimal. It does nothing for performance. The benchmark Jeroen provides is a micro-benchmark that calls the same functions repeatedly. This is trivial and unrealistic. So, there is no real evidence either way. I will try to provide some.
The point of PEP 590 is that it allows performance improvements by allowing callables more freedom of implementation. To repeat an example from an earlier email, which may have been overlooked, this code reduces the time to create ranges and small lists by about 30%
https://github.com/markshannon/cpython/compare/vectorcall-minimal...markshannon:vectorcall-examples https://gist.github.com/markshannon/5cef3a74369391f6ef937d52cca9bfc8
To speed up calls to builtin functions by a measurable amount will need some work on argument clinic. I plan to have that done before PyCon in May.
Cheers, Mark.
- Previous message (by thread): [Python-Dev] Fixing the ctypes implementation of the PEP3118 buffer interface
- Next message (by thread): [Python-Dev] PEP 580 and PEP 590 comparison.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]