Issue 27926: ctypes is too slow to convert a Python list to a C array (original) (raw)

This is a consequence of several factors. It starts with the init method of ctypes.Array, Array_init. This function doesn't hard-code calling the base sq_ass_item slot function, Array_ass_item. If it did, it wouldn't be nearly as slow. Instead it calls the abstract function PySequence_SetItem. Doing it this way accommodates an array subclass that overrides setitem.

What I'd like to do here is check whether the sq_ass_item slot is defined as Array_ass_item, and if so call it directly instead of PySequence_SetItem. But it turns out that it's not set as Array_ass_item even if the subclass doesn't override setitem, and more than anything this is the real culprit for the relative slowness of Array_init.

If a built-in type such as ctypes.Array defines both mp_ass_subscript and sq_ass_item, then the setitem wrapper_descriptor wraps the more generic mp_ass_subscript slot function. Then for a subclass, update_one_slot in Objects/typeobject.c plays it safe when updating the sq_ass_item slot. It sees that the inherited setitem descriptor doesn't call wrap_sq_setitem, so it defines the slot in the subclass to use the generic function slot_sq_ass_item.

This generic slot function goes the long way around to look up and bind the setitem method and convert the Py_ssize_t index to a Python integer, to call the wrapper that calls the mp_ass_subscript slot. To add insult to injury, the implementation of this slot for a ctypes Array, Array_ass_subscript, has to convert back to a Py_ssize_t integer via PyNumber_AsSsize_t.

I don't know if this can be resolved while preserving the generic design of the initializer. As is, calling PySequence_SetItem in a tight loop is ridiculously slow. I experimented with calling Array_ass_item directly. With this change it's as fast as assigning to a slice of the whole array. Actually with a list it's a bit slower because *t has to be copied to a tuple. But it takes about the same amount of time as assigning to a slice when t is already a tuple, such as tuple(range(1000000)).

I doubt any amount of tweaking will make ctypes as fast as an array.array. ctypes has a generic design to accommodate simple C data, pointers, and aggregate arrays, structs, and unions. This comes with some cost to performance. However, you can and should make use of the buffer protocol to use arrays from the array module or numpy where performance is critical. It's trivial to create a ctypes array from an object that supports the buffer protocol. For example:

v = array.array('I', t)
a = (ctypes.c_uint32 * len(v)).from_buffer(v)

There's no need to use the array.array's buffer_info() or ctypes.cast(). The from_buffer() method creates an array that shares the buffer of the source object, so it's relatively fast. It's also returning a sized array instead of a lengthless pointer (though it is possible to cast to an array pointer and immediately dereference the array).

Thank you for these explanations.

I understand that we get a generic function to the cost of performances.

However, I think we should at least tell in the documentation that the constructor (ctypes.c_uint32 * len(t))(*t) is slow and that we can do much faster in some specific cases (e.g. an array of integers).

It would be even better to have some specific method(s) to do this in ctypes, instead of having to rely on an array.array just to build a ctypes array from a list. I am not familiar with CPython code, so I do not know if it would be easily feasible.