[Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support (original) (raw)
Travis E. Oliphant oliphant at enthought.com
Thu Sep 13 21:27:33 CEST 2007
- Previous message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Next message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Guido van Rossum wrote:
On 9/11/07, Travis E. Oliphant <oliphant at enthought.com> wrote:
I'm not sure I understand the difference between a classic read lock and the exclusive write lock concept. Does the classic read-lock just prevent writing to the memory area. In my mind that is a read-only memory buffer and the buffer interface would complain if a writeable buffer was requested.
There are different notions of reading and writing. Sometimes an object it naturally read-only (e.g. a PyString). In that case requesting SIMPLE access should pass but requesting WRITABLE or LOCKDATA access should fail. (I think the other flags are orthogonal to these, right?). Any number of concurrent SIMPLE accesses can coexist since the clients promise they will only read. Yes, the other flags are orthogonal to this concept. OTOH suppose we have an object that is naturally writable (e.g. e PyBytes). I understood that in this case any number of SIMPLE or WRITABLE requests would be allowed to be outstanding simultaneously, and any of these would simply prevent the buffer from moving (fixing the object's size). But this doesn't sound like it is how you meant it -- you seem to say that once any SIMPLE (readonly) requests are outstanding, WRITABLE requests should fail. Wait a minute. I want to clarify that normally any number of SIMPLE or WRITEABLE requests would be possible for an object that is naturally writeable. That is my thinking.
The purpose of LOCKDATA is to allow an object to request that the object not be writeable in the future while it holds a view to the object. I did not think that this would be the normal behavior, but exceptional.
What seems to be needed is yet another flag that allows a buffer requester to insist that the object not allow any buffer accesses read or write until its view is done. So, you would have something like
LOCK_FOR_WRITE LOCK_FOR_READ
I would want to encourage people not to use the LOCK_FOR_READ unless there is an important benefit or need to use it. On the other hand, the argument about dma mechanisms (like moving memory to a video card for processing) needing to make the buffer unavailable temporarily sounds like a reasonable one to me. I can already see applications for it.
And I suppose that only one WRITABLE request ought to be allowed at a time. But then I don't know what the difference between WRITABLE and LOCKDATA would be.
I hope I've clarified the difference between these in my mind. Then a "classic read lock" would request read access while locking out writers (bsddb would use this); I did not separate this case in my mind, as I presumed that if something wanted to prevent other writers it would itself want to write. I can see what is wanted here now. a "classic write lock" would request write access while locking out writers (your scratch area example would use this); others who don't really care if the data changes underneath them as long as it doesn't move (e.g. traditional I/O) could request read access without locking. I'm not sure if there's a use case to be made for write access without locking, but I wouldn't rule it out -- possibly when two threads share a memory area they might have their own protocol for locking it and might just both want to be able to write to (parts of) it. Yes, I would not rule out write-access without locking either. NumPy actually uses that all the time internally where two or more objects share the same data and can both write to it (although the community warns people about doing this without knowing what you are doing). What do you think? Another way to look at this would be to consider these 4 cases: I think I was leaving out the cases
- requesting a read access with future write locking ('classic read lock')
- requesting a read or write access with future read locking.
Let me see how my thinking maps to your list below which at first glance looks pretty good.
basic read access (I can read, others can read or write) locked read access (I can read, others can only read) basic write access (I can read and write, others can read or write) exclusive write access (I can read and write, no others can read or write)
I guess my original LOCK_DATA concept (I can read and write, others can only read) is not even in this list as you discuss below. I'm actually wondering if another function should be added to handle the concept of locking. I can imagine that it will want to grow more fine-grained locking possibilities.
Except that accessing the object from Python (e.g. iteration or indexing) never gets locked out. (Or perhaps it should be? That can also be done.) I think if it doesn't go through the buffer interface it is up to the object to decide (i.e. what does the object do with itself when buffers are exported --- that will depend on the object). All it must do is support the buffer interface in the correct way (i.e. not move the memory buffers are relying on and support the access modes correctly that it purports to export).
Actually, writeable is an accepted variant of 'writable' (but it doesn't show up in many spell-check dictionaries). No, it is not too late to change it. Or just define WRITEABLE as WRITABLE. NumPy uses "WRITEABLE" simply because I like that spelling better.
Google found 1.4M occurrences of writeable vs. 3.9M occurrences of writable. I guess you represent a strong minority. :-) I'd still like to see it changed. We can leave WRITEABLE as an alias for WRITABLE for those who are used to seeing it that way in NumPy. I'm fine with that. Well, the scratch area scenario you describe makes it iffy to read anything out of the original object since you wouldn't know whether you were reading before, during or after the write back from the scratch area to the object's buffer. The question is, do we really care. If we adopted my 4 access modes above, we could say that basic read access will still be granted when someone has exclusive write access if we don't care, OR we could say that basic reads are locked out by exclusive write access. (And then there's the separate issue of whether python-level access counts as basic read access or doesn't count at all -- though the moer I think about it, I think it should be treated the smne as basic read access.) On the other hand, there could be two concepts of locking that a consumer could request from an object 1) Lock so that no other reads or writes are possible until the lock is released. 2) Lock so that only reads are possible. I had only thought of #2 for the current buffer interface. #1 maps to locked read OR exclusive write access in the strict variant. #2 maps to locked read in my scheme. Let me think about adding a function for read-write locking that is separate from getting a view (which implements memory-location locking). I appreciate the discussion as it is helping me clarify my thinking.
-Travis
- Previous message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Next message: [Python-3000] patch: bytes object PyBUF_LOCKDATA read-only and immutable support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]