[Python-Dev] PEP 3118: Extended buffer protocol (new version) (original) (raw)
Travis Oliphant oliphant.travis at ieee.org
Fri Apr 13 09:03:04 CEST 2007
- Previous message: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
- Next message: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Carl Banks wrote:
The thing that bothers me about this whole flags setup is that different flags can do opposite things. Some of the flags RESTRICT the kind of buffers that can be exported (PyBUFWRITABLE); other flags EXPAND the kind of buffers that can be exported (PyBUFINDIRECT). That is highly confusing and I'm -1 on any proposal that includes both behaviors. (Mutually exclusive sets of flags are a minor exception: they can be thought of as either RESTICTING or EXPANDING, so they could be mixed with either.) The mutually exclusive set is the one example of the restriction that you gave.
I think the flags setup I've described is much closer to your Venn diagram concept than you give it credit for. I've re-worded some of the discussion (see http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/numpy/doc/pep_buffer.txt ) so that it is more clear that each flag is a description what kind of buffer the consumer is prepared to deal with.
For example, if the consumer cares about what's 'in' the array, it uses Py_BUF_FORMAT. Exporters are free to do what they want with this information. I agree that NumPy would not force you to use it's buffer only as a region of some specific type, but some other object may want to be much more restrictive and only export to consumers who will recognize the data stored for what it is. I think it's up to the exporters to decide whether or not to raise an error when a certain kind of buffer is requested.
Basically, every flag corresponds to a different property of the buffer that the consumer is requesting:
Py_BUF_SIMPLE --- you are requesting the simplest possible (0x00)
Py_BUF_WRITEABLE -- get a writeable buffer (0x01)
Py_BUF_READONLY -- get a read-only buffer (0x02)
Py_BUF_FORMAT -- get a "formatted" buffer. (0x04)
Py_BUF_SHAPE -- get a buffer with shape information (0x08)
Py_BUF_STRIDES -- get a buffer with stride information (and shape) (0x18)
Py_BUF_OFFSET -- get a buffer with suboffsets (and strides and shape) (0x38)
This is a logical sequence. There is progression. Each flag is a bit that indicates something about how the consumer can use the buffer. In other words, the consumer states what kind of buffer is being requested. The exporter obliges (and can save possibly significant time if the consumer is not requesting the information it must otherwise produce).
I originally suggested a small set of flags that expand the set of allowed buffers. Here's a little Venn diagram of buffers to illustrate what I was thinking:
http://www.aerojockey.com/temp/venn.png With no flags, the only buffers allowed to be returned are in the "All" circle but no others. Add PyBUFWRITABLE and now you can export writable buffers as well. Add PyBUFSTRIDED and the strided circle is opened to you, and so on. My recommendation is, any flag should turn on some circle in the Venn diagram (it could be a circle I didn't draw--shaped arrays, for example--but it should be some circle). I don't think your Venn diagram is broad enough and it un-necessarily limits the use of flags to communicate between consumer and exporter.
We don't have to ram these flags down that point-of-view for them to be productive. If you have a specific alternative proposal, or specific criticisms, then I'm very willing to hear them.
I've thought through the flags again, and I'm not sure how I would change them. They make sense to me. Especially in light of past usages of the buffer protocol (where most people request read-or-write buffers i.e. Py_BUF_SIMPLE. I'm also not sure our mental diagrams are both oriented the same. For me, the most restrictive requests are
PY_BUF_WRITEABLE | Py_BUF_FORMAT and Py_BUF_READONLY | Py_BUF_FORMAT
The most un-restrictive request (the largest circle in my mental Venn diagram) is
Py_BUF_OFFSETS followed by Py_BUF_STRIDES followed by Py_BUF_SHAPE
adding Py_BUF_FORMATS, Py_BUF_WRITEABLE, or Py_BUF_READONLY serves to restrict any of the other circles
Is this dual use of flags what bothers you? (i.e. use of some flags for restricting circles in your Venn diagram that are turned on by other flags? --- you say Py_BUF_OFFSETS | Py_BUF_WRITEABLE to get the intersection of the Py_BUF_OFFSETS largest circle with the WRITEABLE subset?)
Such concerns are not convincing to me. Just don't think of the flags in that way. Think of them as turning "on" members of the bufferinfo structure.
PyBUFFORMAT The consumer will be using the format string information so make sure that member is filled correctly. Is the idea to throw an exception if there's some other data format besides "b", and this flag isn't set? It seems superfluous otherwise. The idea is that a consumer may not care about the format and the exporter may want to know that to simplify the interface. In other words the flag is a way for the consumer to communicate that it wants format information (or not). I'm -1 on using the flags for this. It's completely out of character compared to the rest of the flags. All other flags are there for the benefit of the consumer; this flag is useless to the consumer. More concretely, all the rest of the flags are there to tell the exporter what kind of buffer they're prepared to accept. This flag, alone, does not do that. I agree. This flag is used by the consumer to state that it wants, will be making note of, and is prepared to deal with a "formatted" buffer.
I think it's short-sighted to have flags to control providing the other members of the PyBuffer structure and not this one.
Actually, the "rare" optimization to the exporter can still be significant if most consumers don't care about it's format (which perhaps it has to construct at request time).
If the exporter wants to raise an exception if the format is not requested is up to the exporter. That seems like a bad idea. Suppose I have a contiguous numpy array of floats and I want to view it as a sequence of bytes. If the exporter's allowed to raise an exception for this, any consumer that wanted a data-neutral view of the data would still have to pass PyBUFFORMAT to guard against this. Wouldn't that be ironic?
I agree that NumPy would not do this as it would allow un-formatted views. In fact, most exporters would probably choose not to raise an error. But, an exporter that really only wants it's data viewed as (e.g. complex numbers) would raise an error to force a consumer to be explicit (by providing the Py_BUF_FORMAT flag) about by-passing that desire.
Ok, but is the indexing row-major or column-major? That has to be decided. I think it's called row-major, but I don't like that term because what do you mean for an N-D array? I use 'last-index varies the fastest' if I want to be explicity and C-contiguous if we know what we are talking about. Yes this is assumed in such cases.
-Travis
- Previous message: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
- Next message: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]