[Python-Dev] Allocation of shape and strides fields in Py_buffer (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Wed Dec 10 12:49:47 CET 2008
- Previous message: [Python-Dev] Allocation of shape and strides fields in Py_buffer
- Next message: [Python-Dev] Allocation of shape and strides fields in Py_buffer
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Antoine Pitrou wrote:
In all honesty, I admit I am annoyed by all the problems with the buffer API / memoryview object, many of which are caused by its utterly bizarre design (and the fact that the design team went missing in action after imposing such a bizarre and complex design on us), and I'm reluctant to add yet another level of byzantine complexity in order to solve those problems. It explains I may sound a bit angry at times :-)
If we really need to change things a lot to make them work, we should re-work the buffer API from the ground up, make the Pybuffer struct a true PyObject (that is, a true variable-length object so as to solve the shape and strides allocation issue) and merge it with the current memoryview implementation. It would make things both more simpler and more flexible.
I don't see anything wrong with the PEP 3118 protocol. It does exactly what it is designed to do: allow the number crunching crowd to share large datasets between different libraries without copying things around in memory. Yes, the protocol is complicated, but that is because it is trying to handle a complicated problem.
The memoryview implementation on the other hand is pretty broken. I do have a theory on how it ended up in such an unusable state, but I'm not particularly inclined to share it - this kind of thing can happen sometimes, and the important question now is how we fix it.
As I see it, memoryview is actually trying to do two things, but the design for supporting the second of them doesn't appear to have been adequately thought through in the current implementation.
The first use of a memoryview object is merely to allow access to the Py_buffer of a data store. This is pretty simple, and aside from currently getting len() wrong when itemsize > 1, memoryview isn't terrible at it.
If we left memoryview at that it would just be a simple wrapper around a Py_buffer struct, and it's implementation wouldn't be difficult at all.
Where it gets a bit more complicated is if we want to support slices (rather than just indexing) on memoryview objects. When you do that, the memoryview is no longer a simple wrapper around the Py_buffer of the underlying data store, because it isn't exposing the whole data store any more - it is only exposing part of it.
Requesting access to only part of a data buffer is NOT part of the PEP 3118 API, and it doesn't need to be: it can be part of a separate object that adapts from the underlying data store to the desired subview.
The object that is meant to be performing at least simple 1-dimensional cases of that adaptation is memoryview (or more to the point, memoryview slices), but it currently sucks at this because it relies too heavily on the info in the Py_buffer that it got from the underlying object. That Py_buffer describes the whole data store, but a memoryview slice may only be exposing part of it - so while the info in the Py_buffer is accurate for the underlying object, it is not accurate for the memoryview itself.
Fixing that for the 1 dimensional case shouldn't actually be all that difficult - the memoryview just needs to maintain its own shape[0] entry that reflects the number of items in the view rather than the number in the underlying object.
The multi-dimensional cases get pretty tricky though, since they will almost always end up dealing with non-contiguous data. The PEP 3118 protocol is up to handling the task, but the implementation of the index mapping to handle these multi-dimensional cases is highly non-trivial, and probably best left to third party libraries like numpy.
Cheers, Nick.
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message: [Python-Dev] Allocation of shape and strides fields in Py_buffer
- Next message: [Python-Dev] Allocation of shape and strides fields in Py_buffer
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]