[Python-Dev] Let's change to C API! (original) (raw)

Victor Stinner vstinner at redhat.com
Tue Jul 31 06:51:23 EDT 2018


2018-07-31 8:58 GMT+02:00 Antoine Pitrou <solipsis at pitrou.net>:

What exactly in the C API made it slow or non-promising?

The C API requires that your implementations make almost all the same design choices that CPython made 25 years ago (C structures, memory allocators, reference couting, specific GC implementation, GIL, etc.). Yes, but those choices are not necessarily bad.

I understood that PyPy succeeded to become at least 2x faster than CPython by stopping to use reference counting internally.

Multiple PyPy developers told me that cpyext remains a blocker issue to use PyPy. Probably, but we're talking about speeding up CPython here, right?

My project has different goals. I would prefer to not make any promise about speed. So speed is not my first motivation, or at least not the only one :-)

I also want to make the debug build usable.

I also want to allow OS vendors to provide multiple Python versions per OS release: reduce the maintenance burden, obviously it will still mean more work. It's a tradeoff depending on the lifetime of your OS and the pressure of customers to get the newest Python :-) FYI Red Hat already provide recent development tools on top of RHEL (and Centos and Fedora) because customers are asking for that. We don't work for free :-)

I also want to see more alternatives implementations of Python! I would like to see RustPython succeed!

See the latest version of https://pythoncapi.readthedocs.io/ for the full rationale.

If we're talking about making more C extensions PyPy-compatible, that's a different discussion,

For pratical reasons, IMHO it makes sense to put everything in the same "new C API" bag.

Obviously, I propose to make many changes, and some of them can be more difficult to implement. My proposal contains many open questions and is made of multiple milestones, with a strong requirement on backward compatibility.

and one where I think Stefan is right that we should push people towards Cython and alternatives, rather than direct use of the C API (which people often fail to use correctly, in my experience).

Don't get me wrong: my intent is not to replace Cython. Even if PyPy is pushing hard cffi, many C extensions still use the C API.

Maybe if the C API becomes more annoying and require developers to adapt their old code base for the "new C API", some of them will reconsider to use Cython, cffi or something else :-D

But backward compatibility is a big part of my plan, and in fact, I expect that porting most C extensions to the new C API will be between "free" and "cheap". Obviously, it depends on how much changes we put in the "new C API" :-) I would like to work incrementally.

But the C API is still useful for specialized uses, including for development tools such as Cython.

It seems like http://pythoncapi.readthedocs.io/ didn't explain well my intent. I updated my doc to make it very clear that the "old C API" remains available on purpose. The main question is if you will be able to use Cython with the "old C API" on a new "experimental runtime", or if Cython will be stuck at the "regular runtime".

https://pythoncapi.readthedocs.io/runtimes.html

It's just that for the long term (end of my roadmap), you will have to opt-in for the old C API.

I agree about the overall diagnosis. I just disagree that changing the C API will open up easy optimization opportunities.

Ok, please help me to rephrase the documentation to not make any promise :-)

Currently, I wrote:

""" Optimization ideas

Once the new C API will succeed to hide implementation details, it becomes possible to experiment radical changes in CPython to implement new optimizations.

See Experimental runtime. """

https://pythoncapi.readthedocs.io/optimization_ideas.html

In my early plan, I wrote "faster runtime". I replaced it with "experimental runtime" :-)

Do you think that it's wrong to promise that a smaller C API without implementation details will allow to more easily experiment optimizations?

Actually I'd like to see a list of optimizations that you think are held up by the C API.

Hum, let me use the "Tagged pointers" example. Most C functions use "PyObject*" as an opaque C type. Good. But technically, since we give access to fields of C structures, like PyObject.ob_refcnt or PyListObject.ob_item, C extensions currently dereference directly pointers.

I'm not convinced that tagged pointers will make CPython way faster. I'm just saying that the C API prevents you to even experiment such change to measure the impact on performance.

https://pythoncapi.readthedocs.io/optimization_ideas.html#tagged-pointers-doable

For the "Copy-on-Write" idea, the issue is that many macros access directly fields of C structures and so at the machine code, the ABI uses a fixed offset in memory to read data, whereas my plan is to allow each runtime to use a different memory layout, like putting Py_GC elsewhere (or even remove it!!!) and/or put ob_refcnt elsewhere.

https://pythoncapi.readthedocs.io/optimization_ideas.html#copy-on-write-cow-doable

I have to confess that helping Larry is part of my overall plan. Which is why I'd like to see Larry chime in here.

I already talked a little bit with Larry about my plan, but he wasn't sure that my plan is enough to be able to stop reference counting internally and move to a different garbage collector. I'm only sure that it's possible to keep using reference counting for the C API, since there are solutions for that (ex: maintain a hash table PyObject* => reference count).

Honestly, right now, I'm only convinvced of two things:

Victor



More information about the Python-Dev mailing list