[Python-Dev] Extended Buffer Interface/Protocol (original) (raw)

Carl Banks python-dev at aerojockey.com
Fri Mar 23 06:53:55 CET 2007

Previous message: [Python-Dev] minidom and DOM level 2
Next message: [Python-Dev] Extended Buffer Interface/Protocol
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

(cc'ing back to Python-dev; the original reply was intended for it by I had an email malfunction.)

Travis Oliphant wrote:

Carl Banks wrote:

3. Allow getbuffer to return an array of "derefence offsets", one for each dimension. For a given dimension i, if derefoff[i] is nonnegative, it's assumed that the current position (base pointer + indexing so far) is a pointer to a subarray, and derefoff[i] is the offest in that array where the current position goes for the next dimension. If derefoff[i] is negative, there is no dereferencing. Here is an example of how it'd work:

This sounds interesting, but I'm not sure I totally see it. I probably need a picture to figure out what you are proposing.

I'll get on it sometime. For now I hope an example will do.

The derefoff sounds like some-kind of offset. Is that enough? Why not just make derefoff[i] == 0 instead of negative?

I may have misunderstood something. I had thought the values exported by getbuffer could change as the view narrowed, but I'm not sure if it's the case now. I'll assume it isn't for now, because it simplifies things and demonstrates the concept better.

Let's start from the beginning. First, change the prototype to this:

 typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
                                    Py_ssize_t *len, int *writeable,
                                    char **format, int *ndims,
                                    Py_ssize_t **shape,
                                    Py_ssize_t **strides,
                                    int **isptr)

"isptr" is a flag indicating whether, for a certain dimension, the positision we've strided to so far is a pointer that should be followed before proceeding with the rest of the strides.

Now here's what a general "get_item_pointer" function would look like, given a set of indices:

void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides, Py_ssize_t* derefoff, Py_ssize_t indices) { char pointer = (char*)buf; int i; for (i = 0; i < ndim; i++) { pointer += strides[i]indices[i]; if (isptr[i]) { pointer = (char)pointer; } } return (void*)pointer; }

I don't fully understand the PIL example you gave.

Yeah. How about more details. Here is a hypothetical image data object structure:

struct rgba { unsigned char r, g, b, a; };

struct ImageObject { PyObject_HEAD; ... struct rgba** lines; Py_ssize_t height; Py_ssize_t width; Py_ssize_t shape_array[2]; Py_ssize_t stride_array[2]; Py_ssize_t view_count; };

"lines" points to malloced 1-D array of (struct rgba*). Each pointer in THAT block points to a seperately malloced array of (struct rgba). Got that?

In order to access, say, the red value of the pixel at x=30, y=50, you'd use "lines[50][30].r".

So what does ImageObject's getbuffer do? Leaving error checking out:

PyObject* getbuffer(PyObject *self, void **buf, Py_ssize_t *len, int *writeable, char **format, int *ndims, Py_ssize_t **shape, Py_ssize_t **strides, int **isptr) {

 static int _isptr[2] = { 1, 0 };

 *buf = self->lines;
 *len = self->height*self->width;
 *writable = 1;
 *ndims = 2;
 self->shape_array[0] = height;
 self->shape_array[1] = width;
 *shape = &self->shape_array;
 self->stride_array[0] = sizeof(struct rgba*);  /* yep */
 self->stride_array[1] = sizeof(struct rgba);
 *strides = &self->stride_array;
 *isptr = _isptr;

 self->view_count ++;
 /* create and return view object here, but for what? */

}

There are three essential differences from a regular, contiguous array.

buf is set to point at the array of pointers, not directly to the data.
The isptr thing. isptr[0] is true to indicate that the first dimension is an array of pointers, not the actual data.
stride[0] is sizeof(struct rgba*), not self->width*sizeof(struct rgba) like it would be for a contiguous array. This is because your first stride is through an array of pointers, not the data itself.

So let's examine what "get_item_pointer" above will do given these values. Once again, we're looking for the pixel at x=30, y=50.

First, we set pointer to buf, that is, self->lines.

Then we take the first stride: we add index[0]+strides[0], that is, 50*4=200, to poitner. pointer now equals &self->lines[50].

Now, we check isptr[0]. We see that it is true. Thus, the position we've strided to is, in fact, a pointer to a subarray where the actual data is. So we follow it: pointer = *pointer. pointer now equals self->lines[50] which equals &self->lines[50][0].

Next dimension. We take the second stride: we add index[1]+strides[1], that is, 30*4=120, to pointer. pointer now equals &self->lines[50][30].

Now, we check isptr[1]. It's false. No dereferencing this step.

We're done. Return pointer.

By the way, has anyone signed up to modify the standard library modules? I could do those when the protocol is finalized. And if you're implementing the new buffer protocol in 2.6 (while deprecating but not removing the old protocol, I presume), will the modules also be updated for 2.6?

Nobody has signed up for anything. I'm willing for anyone to help. Many of the standard library modules will need to be modified. And yes, I do want to implement the new protocol for 2.6 (adding it to the current one). Updating the modules for 2.6 would not be high priority (except the struct module), but is a desirable.

Ok, then, consider me available for it.

Thanks for the ideas.

Ok, I have two questions, now.

First, I'm not sure why getbuffer needs to return a view object. I expect most views of data to be created separately--for instance, a view of an image is likely to be created in Python using something like this:

imgview = ImageView(image,(left,right),(top,bottom))

I'd expect the ImageView object would call getbuffer and use the data returned in buf, len, writable, etc., and would have no need for a type-specific view object.

Furthermore, I would expect in many cases different views are desirable, and some cases where the viewer is unknown to the exporter.

And, if it does have to return a view for some reason, why bother returning buf, len, and friends in the function? Just return those values in the view object.

Second question: what happens if a view wants to re-export the buffer? Do the views of the buffer ever change? Example, say you create a transposed view of a Numpy array. Now you want a slice of the transposed array. What does the transposed view's getbuffer export?

Naively, I'd expect the "strides" and "shape" array to have rearranged indices, but it looks like you might be trying to get rid of this complexity.

The reason I ask is: if things like "buf" and "strides" and "shape" could change when a buffer is re-exported, then it can complicate things for PIL-like buffers. (How would you account for offsets in a dimension that's in a subarray?)

Carl Banks

Previous message: [Python-Dev] minidom and DOM level 2
Next message: [Python-Dev] Extended Buffer Interface/Protocol
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list