[Python-Dev] idea for data-type (data-format) PEP (original) (raw)
Travis Oliphant oliphant.travis at ieee.org
Wed Nov 1 21🔞23 CET 2006
- Previous message: [Python-Dev] idea for data-type (data-format) PEP
- Next message: [Python-Dev] idea for data-type (data-format) PEP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin v. Löwis wrote:
Travis E. Oliphant schrieb:
Or, if it does have uses independent of the buffer extension: what are those uses?
So that NumPy and ctypes and audio libraries and video libraries and database libraries and image-file format libraries can communicate about data-formats using the same expressions (in Python). I find that puzzling. In what way can the specification of a data type enable communication? Don't you need some kind of protocol for it (i.e. operations to be invoked)? Also, do you mean that these libraries can communicate with each other? Or with somebody else? If so, with whom? What is puzzling? I've just specified the extended buffer protocol as something concrete that data-format objects are shared through. That's on the C-level. I gave several examples of where such sharing would be useful.
Then, I gave examples in Python of how sharing data-formats would also be useful so that modules could support the same means to construct data-formats (instead of struct using strings, array using typecodes, ctypes using it's type-objects, and NumPy using dtype objects).
What problem do you have in defining a standard way to communicate about binary data-formats (not just images)? I still can't figure out why you are so resistant to the idea. MPI had to do it.
I'm afraid of "dead" specifications, things whose only motivation is that they look nice. They are just clutter. There are a few examples of this already in Python, like the character buffer interface or the multi-segment buffers. O.K. I can understand that concern. But, all you do is make struct, array, and ctypes support the same data-format specification (by support I mean have a way to "consume" and "produce" the data-format object to the natural represenation that they have internally) and you are guaranteed it won't "die." In fact, what would be ideal is for the PIL, NumPy, CVXOpt, PyMedia, PyGame, pyre, pympi, PyVoxel, etc., etc. (there really are many modules that should be able to talk to each other more easily) to all support the same data-format representations. Then, you don't have to learn everybody's re-invention of the same concept whenever you encounter a new library that does something with binary data.
How much time do you actually spend with binary data (sound, video, images, just plain numbers from a scientific experiment) and trying to use multiple Python modules to manipulate it? If you don't spend much time, then I can understand why you don't understand the need.
As for MPI: It didn't just independently define a data types system. Instead, it did that, and specified the usage of the data types in operations such as MPISEND. It is very clear what the scope of this data description is, and what the intended usage is.
Without specifying an intended usage, it is impossible to evaluate whether the specification meets its goals. What is not understood about the intended usage in the extended buffer protocol. What is not understood about the intended usage of giving the array and struct modules a uniform way to represent binary data? Ok, that would be a new usage: I expected that datatype instances always come in pairs with memory allocated and filled according to the description. To me that is the most important usage, but it's not the only one.
If you are proposing to modify/extend the API of the struct and array modules, you should say so somewhere (in a PEP). Sure, I understand that. But, if there is no data-format object, then there is no PEP to "extend the struct and array modules" to support it.
Chicken before the egg, and all that. I expect that the primary readers/users of the PEP would be people who have to write libraries: i.e. people implementing NumPy, struct, array, and people who implement algorithms that operate on data.
Yes, but not only them. If it's a default way to represent data, then users of those libraries that "consume" the representation would also benefit by learning a standard.
So usability of the specification is a matter of how easy it is to write a library that does perform the image manipulation.
If you really want to know. In NumPy it might look like this: Python code: img['r'] = img['g'] img['b'] = img['g'] That's not what I'm asking. Instead, what does the NumPy code look like that gets invoked on these read-and-write operations? Does it only use the void* pointing to the start of the data, and the datatype object? If not, how would C code look like that only has the void* and the datatype object? dtype = img->descr; In this code, is descr a datatype object? ... Yes. But, I have a mistake later... rfield = PyDictGetItemString(dtype,'r'); Actually it should read PyDict_GetItemString(dtype->fields). The r_field is a tuple (data-type object, offset). The fields attribute is (currently) a Python dictionary.
... I guess not, because apparently, it is a dictionary, not a datatype object. Sorry for the confusion.
But, I still don't see how that is relevant to the question of how to represent the data-format to share that information across two extensions.
Well, if NumPy gets the data from a different module, it can't assume there is a descr object that is a dictionary. Instead, it must perform these operations just by using the datatype object. Right. I see. Again, I made a mistake in the code.
img->descr is a data-type object in NumPy.
img->descr->fields is a dictionary of fields keyed by 'name' and returning a tuple (data-type object, offset)
But, the other option (especially for code already written) would be to just convert the data-format specification into it's own internal representation. This is the case that I was thinking about when I said it didn't matter how the library operated on the data.
If new code wanted to use the data-format object as the internal representation, then it would matter.
What else is the purpose of sharing the information, if not to use it to access the data? Of course. I'm sorry my example was incorrect. I guess this falls under the category of "ease of use".
If the data-type format can be the internal representation, then ease of use is optimal because no translation is required. In my ideal world that's the way it would be. But, even if we can't get there immediately, we can at least define a standard for communication.
- Previous message: [Python-Dev] idea for data-type (data-format) PEP
- Next message: [Python-Dev] idea for data-type (data-format) PEP
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]