Issue 15903: Make rawiobase_read() read directly to bytes object (original) (raw)

Issue15903

Created on 2012-09-10 12:23 by sbt, last changed 2022-04-11 14:57 by admin.

Files
File name Uploaded Description Edit
iobase_read.patch sbt,2012-09-10 12:23 review
iobase_read.patch sbt,2012-09-17 20:17 review
Messages (20)
msg170183 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-10 12:23
Currently rawiobase_read() reads to a bytearray object and then copies the data to a bytes object. There is a TODO comment saying that the bytes object should be created directly. The attached patch does that.
msg170226 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2012-09-10 20:01
It works as along as the bytes object cannot leak to Python code, (imagine a custom readinto() method which plays with gc.get_referrers, then calls hash(b)...) This is OK with this patch.
msg170624 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-17 20:17
New patch which checks the refcount of the memoryview and bytes object after calling readinto(). If either refcount is larger than the expected value of 1, then the data is copied rather than resized.
msg170627 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-17 22:30
> If either refcount is larger than the expected value of 1, then the > data is copied rather than resized. I think that's a useless precaution. The bytes object cannot "leak" since you are using PyMemoryView_FromMemory(), which doesn't know about the original object. Out of curiousity, have you done any benchmarks?
msg170628 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-17 23:04
> I think that's a useless precaution. The bytes object cannot "leak" > since you are using PyMemoryView_FromMemory(), which doesn't know about > the original object. The bytes object cannot "leak" so, as you say, checking that refcount is pointless. But the view might "leak", and since it does not own a reference to the base object we have a problem: we can't deallocate the bytes object for fear of breaking the view. It looks like objects returned by PyMemoryView_FromMemory() must never be allowed to "leak", so I am not sure there are many circumstances in which PyMemoryView_FromMemory() is safe to use. Perhaps using PyBuffer_FillInfo() and PyMemory_FromBuffer() would keep alive the bytes object while the view is alive, without letting the bytes object "leak". > Out of curiousity, have you done any benchmarks? No.
msg170629 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-17 23:14
> The bytes object cannot "leak" so, as you say, checking that refcount > is pointless. But the view might "leak", and since it does not own a > reference to the base object we have a problem: we can't deallocate the > bytes object for fear of breaking the view. Indeed, that's a problem (but your patch does deallocate the bytes object). It's quite fishy, I'm not sure how to solve the issue cleanly. Stefan, do you have an idea?
msg170638 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-18 08:33
So the problem is that readinto(view) might result in several references to view? I don't think that can be solved on the memoryview side. One could do: view = PyMemoryView_FromObject(b); // Lie about writability ((PyMemoryViewObject *)view)->view.readonly = 0; [...] Then the view owns a reference to the bytes object. But that does not solve the problem that writable memoryviews based on a readonly object might be hanging around.
msg170639 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-18 10:47
> Then the view owns a reference to the bytes object. But that does not > solve the problem that writable memoryviews based on a readonly object > might be hanging around. How about doing PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE); view = PyMemoryView_FromBuffer(&buf); // readinto view PyBuffer_Release(&buf); Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe.
msg170642 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-18 12:49
The current non-test uses of PyMemoryView_FromBuffer() are in _io.BufferedReader.read(), _io.BufferedWriter.write(), PyUnicode_Decode(). It looks like they can each be made to leak a memoryview that references a deallocated buffer. (Maybe the answer is Don't Do That.) import codecs, sys def decode(buf): global view view = buf return codecs.latin_1_decode(buf) def getregentry(): return codecs.CodecInfo(name='foobar', decode=decode, encode=codecs.latin_1_encode) @codecs.register def search_function(encoding): if encoding == 'foobar': return codecs.CodecInfo(*getregentry()) b = b'hello'.upper() b.decode('foobar') print(view.tobytes()) # => b'HELLO' del b x = b'dump'.upper() print(view.tobytes()) # => b'DUMP\x00' import io, sys class File(io.RawIOBase): def readinto(self, buf): global view view = buf n = len(buf) buf[:] = b'x'*n return n def readable(self): return True f = io.BufferedReader(File()) f.read(1) print(view[:5].tobytes()) # => b'xxxxx' del f print(view[:5].tobytes()) # => b'\xdd\xdd\xdd\xdd\xdd'
msg170658 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-18 18:19
I am rather confused about the ownership semantics when one uses PyMemoryView_FromBuffer(). It looks as though PyMemoryView_FromBuffer() "steals" ownership of the buffer since, when the associated _PyManagedBufferObject is garbage collected, PyBuffer_Release() is called on its copy of the buffer info. However, the _PyManagedBufferObject does not own a reference of the base object, so one still needs to decref the base object (at some time when it is safe to do so). So am I right in thinking that PyObject_GetBuffer(obj, &buf, ...); view = PyMemoryView_FromBuffer(&buf); // view->master owns the buffer, but view->master->obj == NULL ... Py_DECREF(view); // releases buffer (assuming no other exports) Py_XDECREF(buf.obj); has balanced refcounting and is more or less equivalent to view = PyMemoryView_FromObject(obj); ... Py_DECREF(view); The documentation is not very helpful. It just says that calls to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release().
msg170659 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-18 18:20
Richard Oudkerk <report@bugs.python.org> wrote: > PyObject_GetBuffer(b, &buf, PyBUF_WRITABLE); > view = PyMemoryView_FromBuffer(&buf); > // readinto view > PyBuffer_Release(&buf); > > Would attempts to access a "leaked" reference to view now result in ValueError("operation forbidden on released memoryview object")? If so then I think this would be safe. You would need to call memory_release(). Perhaps we can just expose it on the C-API level as PyMemoryView_Release(). IMO the use of PyObject_GetBuffer() should be discouraged. The semantics aren't clear (see #15821). I'd suggest using: 1) A buffer provider is involved (the case here): PyMemoryView_From Object() 2) A piece of memory needs to be wrapped temporarily and no references to the memoryview are "leaked" on the Python level: PyMemoryView_FromMemory() 3) A piece of memory needs to be packaged as a memoryview with automatic cleanup in mbuf_dealloc(): PyMemoryView_FromBufferWithCleanup() (proposed in ) So I think the combination of PyMemoryView_FromObject() with a call to PyMemoryView_Release() should indeed work here.
msg170660 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-18 18:22
Richard Oudkerk <report@bugs.python.org> wrote: > The documentation is not very helpful. It just says that calls > to PyObject_GetBuffer() must be matched with calls to PyBuffer_Release(). Yes, we need to sort that out, see #15821.
msg170675 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-18 20:47
> You would need to call memory_release(). Perhaps we can just expose it on the > C-API level as PyMemoryView_Release(). Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0? Doing Py_TYPE(view->mbuf)->tp_clear((PyObject *)view->mbuf); seems to have the desired effect of causing ValueError when I try to access any associated memoryview. > 3) A piece of memory needs to be packaged as a memoryview with automatic > cleanup in mbuf_dealloc(): > > PyMemoryView_FromBufferWithCleanup() (proposed in ) Maybe this should also handle decrefing the base object (given a flag PyManagedBuffer_FreeObj). I do worry about creating memoryviews that survive deallocation of the base object.
msg170676 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-18 20:56
> So I think the combination of PyMemoryView_FromObject() with a call to > PyMemoryView_Release() should indeed work here. I don't think we want to expose a mutable bytes object to outside code, so IMO PyMemoryView_FromMemory() is preferrable.
msg170678 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-18 21:19
Richard Oudkerk <report@bugs.python.org> wrote: > Should PyMemoryView_Release() release the _PyManagedBufferObject by doing mbuf_release(view->mbuf) even if view->mbuf->exports > 0? No, I think it should really just be a wrapper: diff --git a/Objects/memoryobject.c b/Objects/memoryobject.c --- a/Objects/memoryobject.c +++ b/Objects/memoryobject.c @@ -1093,6 +1093,12 @@ return memory_release((PyMemoryViewObject *)self, NULL); } +PyObject * +PyMemoryView_Release(PyObject *m) +{ + return memory_release((PyMemoryViewObject *)m, NULL); +} + We decided in #10181 not to allow releasing a view with exports, since the logic is already quite complex. Is there a reasonable expectation that existing code creates memoryviews of the readinto() argument?
msg170679 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2012-09-18 21:41
Antoine Pitrou <report@bugs.python.org> wrote: > I don't think we want to expose a mutable bytes object to outside code, > so IMO PyMemoryView_FromMemory() is preferrable. I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE) just hides the fact that a mutable bytes object is exposed. Are we talking about a big speedup here or could we perhaps just keep the existing code?
msg170680 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2012-09-18 21:51
> Antoine Pitrou <report@bugs.python.org> wrote: > > I don't think we want to expose a mutable bytes object to outside code, > > so IMO PyMemoryView_FromMemory() is preferrable. > > I agree, but PyMemoryView_FromMemory(PyBytes_AS_STRING(b), n, PyBUF_WRITE) > just hides the fact that a mutable bytes object is exposed. Except that the mutable bytes object is not exposed to any outside code, so that weird behaviour can't be observed.
msg170681 - (view) Author: Richard Oudkerk (sbt) * (Python committer) Date: 2012-09-18 22:02
> Are we talking about a big speedup here or could we perhaps just keep > the existing code? I doubt it is worth the hassle. But I did want to know if there was a clean way to do what I wanted.
msg190457 - (view) Author: Dwight Guth (dwight.guth) Date: 2013-06-01 23:23
I was programming something today and thought I should let you know I came across a situation where the current behavior of this function is able to expose what seems to be raw memory to the user. import io class A(io.RawIOBase): def readinto(self, b): return len(b) A().read(100)
msg223225 - (view) Author: Mark Lawrence (BreamoreBoy) * Date: 2014-07-16 15:48
Please note this is also referred to from #15994.
History
Date User Action Args
2022-04-11 14:57:35 admin set github: 60107
2019-04-08 13:06:30 BreamoreBoy set nosy: - BreamoreBoy
2019-04-08 13:02:20 methane set nosy: + methane
2014-10-14 18:01:58 skrah set nosy: - skrah
2014-07-16 15:48:19 BreamoreBoy set nosy: + BreamoreBoymessages: + versions: + Python 3.5
2013-06-01 23:23:23 dwight.guth set nosy: + dwight.guthmessages: +
2012-09-18 22:02:18 sbt set messages: +
2012-09-18 21:51:38 pitrou set messages: +
2012-09-18 21:41:23 skrah set messages: +
2012-09-18 21:19:27 skrah set messages: +
2012-09-18 20:56:43 pitrou set messages: +
2012-09-18 20:50:10 vstinner set nosy: + vstinner
2012-09-18 20:47:55 sbt set messages: +
2012-09-18 18:22:55 skrah set messages: +
2012-09-18 18:20:41 skrah set messages: +
2012-09-18 18:19:43 sbt set messages: +
2012-09-18 12:49:29 sbt set messages: +
2012-09-18 10:47:49 sbt set messages: +
2012-09-18 08:33:54 skrah set messages: +
2012-09-17 23:14:26 pitrou set nosy: + skrahmessages: +
2012-09-17 23:04:56 sbt set messages: +
2012-09-17 22:30:01 pitrou set nosy: + pitroumessages: +
2012-09-17 20:59:07 serhiy.storchaka set type: performancecomponents: + Library (Lib), IO
2012-09-17 20:17:09 sbt set files: + iobase_read.patchmessages: +
2012-09-10 20:01:10 amaury.forgeotdarc set nosy: + amaury.forgeotdarcmessages: +
2012-09-10 18:01:46 jcea set nosy: + jceastage: patch reviewversions: + Python 3.4
2012-09-10 12:23:03 sbt create