[Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support (original) (raw)
Guido van Rossum guido at python.org
Tue Sep 11 21:02:41 CEST 2007
- Previous message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Next message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9/10/07, Travis E. Oliphant <oliphant at enthought.com> wrote:
Guido van Rossum wrote: > I'd like to see Travis's response to this. It's setting a precedent > regarding locking objects in read-only mode; I haven't found other > examples of objects using LOCKDATA (the only mentions of it seem to be > rejecting it :). I keep getting confused by the two separate lock > counts (and I think in this version the comment is inconsistent with > the code). So I'm hoping Travis has a particular way in mind of > handling LOCKDATA that can be used as a template. > > Travis?
The use case I had in mind comes about quite often in NumPy when you want to modify the data-area of an object which may have a non-contiguous chunk of memory, but the algorithm being used expects contiguous data. Imagine, for example, that the exporting object is an image whose rows are stored in different segments. The consumer of the buffer interface, however, may be an extension module that does fast image-processing operations and requires contiguous data. Because it wants to write the results back in to the memory area when it is done with the algorithm (which may be thread-safe and may release the GIL), it requests the object to lock its data to read-only so that other consumers do not try to get writeable buffers while it is processing. When the algorithm is done, it alone can write to the memory area and then when it releases the buffer, the original object will restore itself to being writeable. Of course, the exporting object must support this kind of operation and not all objects will. I expect the NumPy array object and the PIL to support it for example, and other media-centric objects.
Hm, so this is completely different from what I thought. It seems you are describing the following:
- acquire the buffer with LOCK_DATA
- copy the data out of the buffer into a scratch area
- work on the scratch area
- copy the data from the scratch area back into the buffer
- release the buffer
i would call this an exclusive write lock, which is quite different from the read lock interpretation implemented by Greg in his patch. Could you add some language to PEP 3118 to clarify this usage? Or is it already there? I admit to not having read it in full...
It would probably be useful if the bytes object supported it because then other objects could use it as the memory area. To do it correctly, the object exporting the interface must only allow locking if no other writeable interfaces have been exported (which it must keep track of) and then on release must check to see if the buffer that is being released is the one that locked its data.
Right. So it seems you would need a counter of outstanding non-data-locked buffer requests and a single bit indicating whether there's a data-locked request. (Rather than two counters like Greg's patch currently uses.)
The hacker in me is already exploring the possibility of making the count negative if there's a data-locked request; it sounds like the valid transitions are:
0 -> 1 -> 2 -> ... (SIMPLE or WRITABLE get) ... -> 2 -> 1 -> ... (SIMPLE or WRITABLE release) 0 -> -1 (LOCKDATA get) -1 -> 0 (LOCKDATA release)
Have I got that right? I think that you should only be able to request LOCKDATA if there are no other readers or writers, but that SIMPLE and WRITABLE clients should be able to coexist (any mess that creates would be the requester's own fault). Any nonzero value here would indicate that the buffer can't be moved.
I note that the use case in the bsddb wrapper extension is a bit different -- Greg suspects that BerkeleyDB won't like the data changing while it is using it (e.g. it might violate its own invariant if the key changes between the time its hash is computed and the time it is written to disk). To ensure this, currently LOCKDATA is the only option; but a classic read lock would allow multiple concurrent readers (which is how Greg's patch to bytesobject.c interprets LOCKDATA).
I think this needs to be clarified. Perhaps we need to separate clearer the type of access (read or write) and the amount of locking desired (can others read? can others write?).
(BTW The current implementation in bytesobject.c allows changing the size as long as it fits within the allocated size; I think this is probably too lenient, and begging for latent bugs.)
(Spelling alert: 'writeable' is apparently not an English word. I hope it's not too late to rename the flag to PyBUF_WRITABLE. I've opened http://bugs.python.org/issue1150 to track this.)
For a real-life example, NumPy has a flag called UPDATEIFCOPY that is a slightly different implementation of the concept. When this flag is set during conversion to an array, then if a copy must be made to satisfy the requirements, the original array is set as read-only and this special flag is set on the array. When the copy is deleted, its memory is automatically copied (and possibly casted, etc.) back into the original array. It is a nice abstraction of the concept of an output data area that was borrowed from Numarray and allows many things to be implemented very quickly in NumPy.
So in terms of locks, this effectively sets read and write locks on the original object (since whatever you might read out of it may be invalidated when the modified copy is written back). But how to enforce that at the Python level? If we had something like this for the bytes object, any use of the bytes object from Python (e.g. iterating over it or indexing or slicing it) should be prohibited. Is this reasonable?
One of the main things people use the NumPy C-API for is to get a contiguous chunk of memory from an array in order to do processing in another language (such as C or Fortran). It is nice to be able to specify that the result gets placed back into another chunk of memory (which may or may not be contiguous) in a unified fashion. NumPy handles all the copying for you.
My thinking was that many people will want to be able to get contiguous chunks of memory, do processing, and then copy the result back into a segment of memory from a buffer-exporting object which is passed into the routine as an output object.
This is probably common for numpy; for the bytes object, I expect that it's all much simpler, since it's just a contiguous 1D array of bytes...
I'm not sure if my explanations are helpful. Please let me know if I can explain further.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Next message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]