[Python-Dev] C-level duck typing (original) (raw)
Dag Sverre Seljebotn d.s.seljebotn at astro.uio.no
Wed May 16 09:44:10 CEST 2012
- Previous message: [Python-Dev] dir() in inspect.py ?
- Next message: [Python-Dev] C-level duck typing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi python-dev,
these ideas/questions comes out of the Cython and NumPy developer lists.
What we want is a way to communicate things on the C level about the extension type instances we pass around. The solution today is often to rely on PyObject_TypeCheck. For instance, hundreds of handcrafted C extensions rely on the internal structure of NumPy arrays, and Cython will check whether objects are instances of a Cython class or not.
However, this creates one-to-many situations; only one implementor of an object API/ABI, but many consumers. What we would like is multiple implementors and multiple consumers of mutually agreed-upon standards. We essentially want more duck typing on the C level.
A similar situation was PEP 3118. But there's many more such things one might want to communicate at the C level, many of which are very domain-specific and not suitable for a PEP at all. Also PEPs don't backport well to older versions of Python.
What we think we would like (but we want other suggestions!) is an arbitrarily extensible type object, without tying this into the type hierarchy. Say you have
typedef struct { unsigned long extension_id; void *data; } PyTypeObjectExtensionEntry;
and then a type object can (somehow!) point to an array of these. The array is linearly scanned by consumers for IDs they recognize (most types would only have one or two entries). Cython could then get a reserved ID space to communicate whatever it wants, NumPy another one, and there could be "unofficial PEPs" where two or more projects get together to draft a spec for a particular type extension ID without having to bother python-dev about it.
And, we want this to somehow work with existing Python; we still support users on Python 2.4.
Options we've thought of so far:
a) Use dicts and capsules to get information across. But performance-wise the dict lookup is not an option for what we want to use this for in Cython.
b) Implement a metaclass which extends PyTypeObject in this way. However, that means a common runtime dependency for libraries that want to use this scheme, which is a big disadvantage to us. Today, Cython doesn't ship a runtime library but only creates standalone compileable C files, and there's no dependency from NumPy on Cython or the other way around.
c) Hijack a free bit in tp_flags (22?) which we use to indicate that the PyTypeObject struct is immediately followed by a pointer to such an array.
The final approach is drafted in more detail at http://wiki.cython.org/enhancements/cep1001 . To us that looks very attractive both for the speed and for the lack of runtime dependencies, and it seems like it should work in existing versions of Python. But do please feel free to tell us we are misguided. Hijacking a flag bit certainly feels dirty.
Examples of how this would be used:
In Cython, we'd like to use this to annotate callable objects that happen to wrap a C function with their corresponding C function pointers. That way, callables that wrap a C function could be "unboxed", so that Cython could "cast" the Python object "scipy.special.gamma" to a function pointer at runtime and speed up the call with an order of magnitude. SciPy and Cython just needs to agree on a spec.
Lots of C extensions rely on using PyObject_TypeCheck (or even do an exact check) before calling the NumPy C API with PyArrayObject* arguments. This means that new features all have to go into NumPy; it is rather difficult to create new experimental array libraries. Extensible PyTypeObject would open up the way for other experimental array libraries; NumPy could make the standards, but others implement them (without getting NumPy as a runtime dependency, which is the consequence of subclassing). Of course, porting over the hundreds (thousands?) of extensions relying on the NumPy C API is a lot of work, but we can at least get started...
Ideas?
Dag Sverre Seljebotn
- Previous message: [Python-Dev] dir() in inspect.py ?
- Next message: [Python-Dev] C-level duck typing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]