[Python-Dev] reference counting in Py3K (original) (raw)

Josiah Carlson jcarlson at uci.edu
Wed Sep 7 09:57:36 CEST 2005


Guido van Rossum <guido at python.org> wrote:

On 9/6/05, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: > A better plan would be to build something akin to > Pyrex into the scheme of things, so that all the > refcount/GC issues are taken care of automatically.

That sounds exciting. I have to admit that despite hearing many enthusiastic reviews, I've never used it myself -- in fact I've written very little C code in the last few years, and zero new extension modules. (Lots of Java, but that's another story. :-)

Here's a perspective "from the trenches" as it were.

I've been writing quite a bit of code, initially all in Python (27k lines in the last year or so). It worked reasonably well and fast. It wasn't fast enough. I needed a 25x increase in performance, which would have been easily attainable if I were to rewrite everything in C, but writing a module in pure C is a bit of a pain (as others can attest), so I gave Pyrex a shot (after scipy.weave.inline, ick).

Initial versions ran around 2-3x as fast as pure Python. With various tricks, we are now running 75-100x faster in the pure Pyrex portions, with another 2-3x improvement possible (even using the VC6 compiler in Windows and old versions of gcc in linux, talk about multi-platform development!).

With experience comes wisdom. I write new functionality that needs to be fast in pure C, wrapping it with Pyrex as necessary (which is quite simple), and make it all work with Python.

I expect that many standard extensions could benefit from a rewrite in Pyrex, although this might take a lot of work and in some cases not necessarily result in better code (tkinter comes to mind -- though I don't really know why this would be). So this shouldn't be the goal (yet). Instead, we should encourage folks to write new extensions using Pyrex.

I'm not sure this is necessarily desireable. In my limited experience, one starts doing a line-by-line translation, getting Python objects as variables, etc. Then one starts predefining C variables and working with them, increasing speed by some measureable amount. Then one starts thinking about the data structures that are being passed (lists of lists, dictionary of lists, lists of dictionaries, ...), at which point one starts digging into PyList_GetItem, etc., manual in/decrefing, ..., and one's code starts getting the ugly of C modules, without the braces and semicolons.

Offering it up as a standard library module: cool, +1. Give people one of the the best tools for wrapping C code and writing high-performance Python-accessable software.

Encouraging its use for the writing of new extension modules: ick, -1. Writing pretty yet high performing Pyrex is an art that I'm not sure anyone can master.

Perhaps a bit into the future, extending import semantics to notice .pyx files, compare their checksum against a stored md5 in the compiled .pyd/.so, and automatically recompiling them if they (or their includes) have changed: +10 (I end up doing this kind of thing by hand with phantom auto-build modules).



More information about the Python-Dev mailing list