[Python-Dev] Use of Cython (original) (raw)

Stefan Behnel stefan_ml at behnel.de
Tue Sep 4 14:55:56 EDT 2018


Yury Selivanov schrieb am 04.09.2018 um 18:19:

On Sat, Sep 1, 2018 at 6:12 PM Stefan Behnel wrote:

Yury Selivanov schrieb am 07.08.2018 um 19:34:

The first goal is to compile mypy with it to make it faster, so I hope that the project will be completed.

That's not "the first goal". It's the /only/ goal. The only intention of mypyc is to be able to compile and optimise enough of Python to speed up the kind or style of code that mypy uses.

Essentially, mypyc will be similar to Cython, but mypyc is a subset of Python, not a superset. Which is bad, right? It means that there will be many things that simply don't work, and that you need to change your code in order to make it compile at all. Cython is way beyond that point by now. Even RPython will probably continue to be way better than mypyc for quite a while, maybe forever, who knows. To be clear I'm not involved with mypyc, but my understanding is that the entire Python syntax will be supported, except some dynamic features like patching globals(), locals(), or classes, or class.

No, that's not the goal, at least from what I understood from my discussions with Jukka. The goal is to make it compile mypy, be it by supporting Python features in mypyc or by avoiding Python features in mypy. I'm sure they will take any shortcut they can in order to avoid having to make mypyc too capable, because mypyc is not more than a means to an end. For example, they may easily get away without supporting generators and closures, which are quite difficult to implement in C. But finding a non-trivial piece of Python code out there that uses neither of the two is probably not easy.

I'm also sure they will avoid Python semantics wherever they can, because implementing them in the same way as CPython and Cython would mean that certain constructs cannot safely be statically reasoned about, and thus cannot be optimised. Avoiding (full) Python semantics relieves you from these restrictions, and if you control both sides, the compiler and the code that it compiles, then it becomes much easier to apply arbitrary optimisations at will.

IMHO, what they are implementing is much closer to ShedSkin than to Cython.

Interfacing with C libraries can be easily achieved with cffi.

Except that it will be fairly slow. cffi is not designed for static analysis but for runtime operations. Could you please clarify this point? My current understanding is that you can build a static compiler with a knowledge about cffi so that it can compile calls like ffi.new("somethingt[]", 80) to pure C.

I'm sure there is a relatively large subset of cffi's API that could be compiled statically, as long as the declartions and their usage are kept simple and fully visible to the compiler. What that subset is remains to be seen once someone actually tries to do it.

Yeah, statically compiling cffi-enabled code is probably the way to go for mypyc and Cython.

I doubt it, given the expected restrictions and verbosity. But debating this is useless as long as no-one attempts to actually write a static compiler for cffi(-like) code.

Using Cython/C types usually means that you need to use pxd/pyx files which means that the code isn't Python anymore.

I'm aware that this is a very common misconception that is difficult to get out of people's heads. You probably got this idea from wrapping a native library, in which case the only choice you have in order to declare an external C-API is really to use Cython's special syntax. However, this would not apply to most use cases in the CPython project context, and it also does not necessarily apply to most of the code in a Cython module even if it uses external libraries.

Cython has four ways to provide type declarations: cdef statements in Cython code, external .pxd files for Python or Cython files, special decorators and declaration functions, and PEP-484/526 type annotations.

All four have their use cases (e.g. syntax support in different Python versions, efficiency of expression, readability for people with different backgrounds, etc.), and all but the first allow users to keep their module code in Python syntax. As long as you do not call into external native code, it's your choice which of these you prefer for your code base, project context and developer background. You can even mix them at will, if you feel like it.

I know that Cython has a mode to use decorators in pure Python code to annotate types, but they are less intuitive than using typing annotations in 3.6+.

You can use PEP-484/526 type annotations to declare Cython types in Python code that you intend to compile. It's entirely up to you, and it's an entirely subjective measure which "is better". Many people prefer Cython's non-Python syntax because it allows them to apply their existing C knowledge. For them, PEP-484 annotations may easily be non-intuitive in comparison.

For CPython it means that we'd have Python, C, and Cython to learn to understand code written in Cython. There's a very popular assumption that you have to be proficient in C in order to become a CPython core dev and people are genuinely surprised when I tell them that it's not a requirement. At the three conferences I've been this summer at least 5 people complained to me that they didn't even consider contributing to CPython because they don't know C. Adding yet another language would simply raise this bar even higher, IMHO.

Adding the right language would lower the bar, IMHO. Cython is Python. It allows users with a Python background to implement C things without having to thoroughly learn C /and/ the CPython C-API first. So, the way I see it, rather than /adding/ a "third" language to the mix, it substantially lowers the entry level from the current two and a half languages (Python + C + C-API) to one and a half (Python + Cython).

I'd be +0.5 on using Cython (optionally?) to compile some pure Python code to make it 30-50% faster. asyncio, for instance, would certainly benefit from that.

Since most of this (stdlib) Python code doesn't need to stay syntax compatible with Python < 3.6 (actually 3.8) anymore, you can probably get much higher speedups than that by statically typing some variables and functions here and there. I recently tried that with difflib, makes a big difference.

Stefan



More information about the Python-Dev mailing list