[Python-Dev] C-level duck typing (original) (raw)

Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed May 16 09:44:10 CEST 2012


Hi python-dev,

these ideas/questions comes out of the Cython and NumPy developer lists.

What we want is a way to communicate things on the C level about the extension type instances we pass around. The solution today is often to rely on PyObject_TypeCheck. For instance, hundreds of handcrafted C extensions rely on the internal structure of NumPy arrays, and Cython will check whether objects are instances of a Cython class or not.

However, this creates one-to-many situations; only one implementor of an object API/ABI, but many consumers. What we would like is multiple implementors and multiple consumers of mutually agreed-upon standards. We essentially want more duck typing on the C level.

A similar situation was PEP 3118. But there's many more such things one might want to communicate at the C level, many of which are very domain-specific and not suitable for a PEP at all. Also PEPs don't backport well to older versions of Python.

What we think we would like (but we want other suggestions!) is an arbitrarily extensible type object, without tying this into the type hierarchy. Say you have

typedef struct { unsigned long extension_id; void *data; } PyTypeObjectExtensionEntry;

and then a type object can (somehow!) point to an array of these. The array is linearly scanned by consumers for IDs they recognize (most types would only have one or two entries). Cython could then get a reserved ID space to communicate whatever it wants, NumPy another one, and there could be "unofficial PEPs" where two or more projects get together to draft a spec for a particular type extension ID without having to bother python-dev about it.

And, we want this to somehow work with existing Python; we still support users on Python 2.4.

Options we've thought of so far:

a) Use dicts and capsules to get information across. But performance-wise the dict lookup is not an option for what we want to use this for in Cython.

b) Implement a metaclass which extends PyTypeObject in this way. However, that means a common runtime dependency for libraries that want to use this scheme, which is a big disadvantage to us. Today, Cython doesn't ship a runtime library but only creates standalone compileable C files, and there's no dependency from NumPy on Cython or the other way around.

c) Hijack a free bit in tp_flags (22?) which we use to indicate that the PyTypeObject struct is immediately followed by a pointer to such an array.

The final approach is drafted in more detail at http://wiki.cython.org/enhancements/cep1001 . To us that looks very attractive both for the speed and for the lack of runtime dependencies, and it seems like it should work in existing versions of Python. But do please feel free to tell us we are misguided. Hijacking a flag bit certainly feels dirty.

Examples of how this would be used:

Ideas?

Dag Sverre Seljebotn



More information about the Python-Dev mailing list