Issue 22003: BytesIO copy-on-write - Python tracker (original) (raw)
Created on 2014-07-17 22:25 by dw, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (37)
Author: David Wilson (dw) *
Date: 2014-07-17 22:25
This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI.
The goal of posting the patch on list was to try and stimulate discussion around the approach. The patch itself obviously isn't ready for review, and I'm not in a position to dedicate time to it just now (although in a few weeks I'd love to give it full attention!).
Ignoring this quick implementation, are there any general comments around the approach?
My only concern is that it might keep large objects alive in a non-intuitive way in certain circumstances, though I can't think of any obvious ones immediately.
Also interested in comments on the second half of that thread: "a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer."
There are quite a few interactions with making that work correctly, in particular:
How BytesIO would implement the buffers interface without causing the under-construction Bytes to become readonly
Avoiding redundant copies and resizes -- we can't simply tack 25% slack on the end of the Bytes and then truncate it during getvalue() without likely triggering a copy and move, however with careful measurement of allocator behavior there are various tradeoffs that could be made - e.g. obmalloc won't move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a bit more complex though.
Could also add a private _PyBytes_SetSize() API to allow truncation to the final size during getvalue() without informing the allocator. Then we'd simply overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.
Notably, this approach completely differs from the one documented in http://bugs.python.org/issue15381 .. it's not clear to me which is better.
Author: David Wilson (dw) *
Date: 2014-07-17 22:30
Submitted contributor agreement. Please consider the demo patch licensed under the Apache 2 licence.
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-17 22:46
Be careful what happens when the original object is mutable:
b = bytearray(b"abc") bio = io.BytesIO(b) b[:] = b"defghi" bio.getvalue() b'abc'
I don't know what your patch does in this case.
Author: David Wilson (dw) *
Date: 2014-07-18 07:12
Good catch :( There doesn't seem to be way a to ask for an immutable buffer, so perhaps it could just be a little more selective. I think the majority of use cases would still be covered if the sharing behaviour was restricted only to BytesType.
In that case "Py_buffer initialdata" could become a PyObject*, saving a small amount of memory, and allowing reuse of the struct member if BytesIO was also modified to directly write into a private BytesObject
Author: Stefan Behnel (scoder) *
Date: 2014-07-18 08:42
Even if there is no way to explicitly request a RO buffer, the Py_buffer struct that you get back actually tells you if it's read-only or not. Shouldn't that be enough to enable this optimisation?
Whether or not implementors of the buffer protocol set this flag correctly is another question, but if not then they need fixing on their side anyway. (And in the vast majority of cases, the implementor will be either CPython or NumPy.)
Also, generally speaking, I think such an optimisation would be nice, even if it only catches some common cases (and doesn't break the others :). It could still copy data if necessary, but try to avoid it if possible.
Author: David Wilson (dw) *
Date: 2014-07-20 16:50
This version is tidied up enough that I think it could be reviewed.
Changes are:
Defer `buf' allocation until init, rather than new as was previously done. Now upon completion, BytesIO.new returns a valid, closed BytesIO, whereas previously a valid, empty, open BytesIO was returned. Is this interface change permissible?
Move init guts into a "reinit()", for sharing with setstate, which also previously caused an unnecessary copy. Additionally gather up various methods for deallocating buffers into a single "reset()" function, called by reinit(), _dealloc(), and _close()
Per Stefan's suggested approach, reinit() now explicitly checks for a read-only buffer, falling back to silently performing a copy if the returned buffer is read-write. That seems vastly preferable to throwing an exception, which would probably be another interface change.
Call
unshare()
any place the buffer is about to be modified. If the buffer needs to be made private, it also accepts a size hint indicating how much less/more space the subsequent operation needs, to avoid a redundant reallocation after the unsharing.
Outstanding issues:
- I don't understand why buf_size is a size_t, and I'm certain the casting in unshare() is incorrect somehow. Is it simply to avoid signed overflow?
Author: David Wilson (dw) *
Date: 2014-07-20 17:55
New patch also calls unshare() during getbuffer()
Author: David Wilson (dw) *
Date: 2014-07-20 22:07
I'm not sure the "read only buffer" test is strong enough: having a readonly view is not a guarantee that the data in the view cannot be changed through some other means, i.e. it is read-only, not immutable.
Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-21 15:31
Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?
That would certainly sound good enough, to optimize the common case.
Also, it would be nice if you could add some tests to the patch (e.g. to stress the bytearray case). Thank you!
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-21 15:31
As for whether the "checking for a readonly view" approach is broken, I don't know: that part of the buffer API is still mysterious to me. Stefan, would you have some insight?
Author: Stefan Krah (skrah) *
Date: 2014-07-21 16:04
I think checking for a readonly view is fine. The protocol is this:
Use the PyBUF_WRITABLE flag in the request. Then the provider must either have a writable buffer or else deny the request entirely.
Omit the PyBUF_WRITABLE flag in the request. Then the provider can return a writable or a readonly buffer, but must set the readonly flag correctly AND export the same type of buffer to ALL consumers.
It is not possible to ask for a readonly buffer explicitly, but the readonly flag in the Py_Buffer struct should always be set correctly.
It is hard to guess the original intention of the PEP-3118 authors, but in practice "readonly" means "immutable" here. IMO a buffer provider would be seriously broken if a readonly buffer is mutated in any way.
Author: Stefan Krah (skrah) *
Date: 2014-07-21 16:23
The original wording in the PEP is this:
readonly
an integer variable to hold whether or not the memory is readonly. 1
means the memory is readonly, zero means the memory is writable.
To me this means that a hypothetical compiler that could figure out at compile time that the readonly flag is set would be allowed to put the buffer contents into the read-only data section.
Author: David Wilson (dw) *
Date: 2014-07-21 17:02
Stefan,
Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.
For example, mmap.mmap(..., flags=mmap.MAP_SHARED, prot=mmap.PROT_READ) will produce a read-only buffer, yet mutability is entirely at the whim of the operating system. In this case, "immutability" may be apparent for years, until some machine has memory pressure, causing the shared mapping to be be flushed, and refreshed from (say, incoherent NFS storage) on next access.
I thought it would be worth auditing some of the most popular types of buffer just to check your interpretation, and this was the first, most obvious candidate.
Any thoughts? I'm leaning heavily toward the Bytes specialization approach
Author: David Wilson (dw) *
Date: 2014-07-21 18:24
I'm not sure how much work it would be, or even if it could be made sufficient to solve our problem, but what about extending the buffers interface to include a "int stable" flag, defaulting to 0?
It seems though, that it would just be making the already arcane buffers interface even more arcane simply for the benefit of our specific use case
Author: Stefan Krah (skrah) *
Date: 2014-07-21 19:09
I'm sure many exporters aren't setting the right flags; on the other hand we already hash memoryviews based on readonly buffers, assuming they are immutable.
Author: David Wilson (dw) *
Date: 2014-07-21 22:03
Hi Stefan,
How does this approach in reinit() look? We first ask for a writable buffer, and if the object obliges, immediately copy it. Otherwise if it refused, ask for a read-only buffer, and this time expect that it will never change.
This still does not catch the case of mmap.mmap. I am not sure how do deal with mmap.mmap. There is no way for it to export PROT_READ as a read-only buffer without permitted mutation, so the only options seem to either be a) remove buffer support from mmap, or b) blacklist it in bytesio(!).
Antoine, I have padded out the unit tests a little. test_memoryio.py seems the best place for them. Also modified test_sizeof(), although to the way this test is designed seems inherently brittle to begin with. Now it is also sensitive to changes in Py_buffer struct.
Various other changes:
new once again returns a valid, open, empty BytesIO, since the alternative breaks pickling.
reinit() preserves existing BytesIO state until it knows it can succeed, which fixes another of the pickle tests.
setstate() had CHECK_CLOSED() re-added, again for the pickle tests.
Probably the patch guts could be rearranged again, since the definition of the functions is no longer as clear as it was in cow3.patch.
Author: Serhiy Storchaka (serhiy.storchaka) *
Date: 2014-07-22 07:15
See also .
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-22 18:50
There's also the following code in numpy's getbuffer method:
/*
* If a read-only buffer is requested on a read-write array, we return a
* read-write buffer, which is dubious behavior. But that's why this call
* is guarded by PyArray_ISWRITEABLE rather than (flags &
* PyBUF_WRITEABLE).
*/
if (PyArray_ISWRITEABLE(self)) {
if (array_might_be_written(self) < 0) {
goto fail;
}
}
... which seems to imply that mmap is not the only one with "dubious behaviour" (?).
Author: Stefan Krah (skrah) *
Date: 2014-07-22 19:12
Actually we have an extra safety net in memory_hash() apart from the readonly check: We also check if the underlying object is hashable.
This might be applicable here, too. Unfortunately mmap objects are hashable, leading to some funny results:
import mmap with open("hello.txt", "wb") as f: ... f.write(b"xxxxx\n") ... 6 f = open("hello.txt", "r+b") mm = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) x = memoryview(mm) hash(mm) -9223363309538046107 hash(x) -3925142568057840789 x.tolist() [120, 120, 120, 120, 120, 10]
with open("hello.txt", "wb") as g: ... g.write(b"yyy\n") ... 4 hash(mm) -9223363309538046107 hash(x) -3925142568057840789 x.tolist() [121, 121, 121, 10, 0, 0]
memoryview (rightfully) assumes that hashable objects are immutable and caches the first hash.
I'm not sure why mmap objects are hashable, it looks like a bug to me.
Author: Stefan Krah (skrah) *
Date: 2014-07-22 19:37
I think the mmap behavior is probably worse than the NumPy example.
I assume that in the example the exporter sets view.readonly=0. mmap objects set view.readonly=1 and can still be mutated.
Author: David Wilson (dw) *
Date: 2014-07-22 20:31
Stefan, I like your new idea. If there isn't some backwards compatibility argument about mmap.mmap being hashable, then it could be considered a bug, and fixed in the same hypothetical future release that includes this BytesIO change. The only cost now is that to test for hashability, we must hash the object, which causes every byte in it to be touched (aka. almost 50% the cost of a copy)
If however we can't fix mmap.mmap due to the interface change (I think that's a silly idea -- Python has never been about letting the user shoot themselves in the foot), then the specialized-for-Bytes approach is almost as good (and perhaps even better, since the resulting concept and structure layout is more aligned with Serhiy's patch in ).
tl;dr:
a) mmap.mmap can be fixed - use hashability as strong test for immutability (instead of ugly heuristic involving buffer blags)
- undecided: is calling hash(obj) to check for immutability too costly?
b) mmap.mmap can't be fixed - use the Bytes specialization approach.
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-22 23:19
I don't like the idea of trying to hash the object. It may be a time-consuming operation, while the result will be thrown away.
I think restricting the optimization to bytes objects is fine. We can whitelist other types, such as memoryview.
Author: David Wilson (dw) *
Date: 2014-07-24 22:56
This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.
Anyone care to glance at least at the general structure?
Tests could probably use a little more work.
Microbenchmark seems fine, at least for construction. It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)
cpython] ./python.exe -m timeit -s 'import i' 'i.readlines()' lines: 54471 100 loops, best of 3: 13.3 msec per loop
[23:52:55 eldil!58 cpython] ./python-nocow -m timeit -s 'import i' 'i.readlines()' lines: 54471 10 loops, best of 3: 19.6 msec per loop
[23:52:59 eldil!59 cpython] cat i.py import io word = b'word' line = (word * int(79/len(word))) + b'\n' ar = line * int((4 * 1048576) / len(line)) def readlines(): return len(list(io.BytesIO(ar))) print('lines: %s' % (readlines(),))
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-25 16:26
It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)
I don't have any specific suggestions, but you could try the benchmark suite here: http://hg.python.org/benchmarks
Author: David Wilson (dw) *
Date: 2014-07-27 12:05
Hey Antoine,
Thanks for the link. I'm having trouble getting reproducible results at present, and running out of ideas as to what might be causing it. Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.
I could not detect any significant change between runs of the old and new binary that could not be described as noise, given the isolation issues above.
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-27 15:48
Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.
That's expected. If the difference doesn't go above 5-10%, then you IMO can pretty much consider your patch didn't have any impact on those benchmarks.
Author: David Wilson (dw) *
Date: 2014-07-28 12:30
Newest patch incorporates Antoine's review comments. The final benchmark results are below. Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge
Report on Linux k2 3.14-1-amd64 #1 SMP Debian 3.14.9-1 (2014-06-30) x86_64 Total CPU cores: 4
call_method_slots
Min: 0.329869 -> 0.340487: 1.03x slower Avg: 0.330512 -> 0.341786: 1.03x slower Significant (t=-216.69) Stddev: 0.00067 -> 0.00060: 1.1111x smaller
call_method_unknown
Min: 0.351167 -> 0.343961: 1.02x faster Avg: 0.351731 -> 0.344580: 1.02x faster Significant (t=238.89) Stddev: 0.00033 -> 0.00040: 1.2271x larger
call_simple
Min: 0.257487 -> 0.277366: 1.08x slower Avg: 0.257942 -> 0.277809: 1.08x slower Significant (t=-845.64) Stddev: 0.00029 -> 0.00029: 1.0126x smaller
etree_generate
Min: 0.377985 -> 0.365952: 1.03x faster Avg: 0.381797 -> 0.369452: 1.03x faster Significant (t=31.15) Stddev: 0.00314 -> 0.00241: 1.3017x smaller
etree_iterparse
Min: 0.545668 -> 0.565437: 1.04x slower Avg: 0.554188 -> 0.576807: 1.04x slower Significant (t=-17.00) Stddev: 0.00925 -> 0.00956: 1.0340x larger
etree_process
Min: 0.294158 -> 0.286617: 1.03x faster Avg: 0.296354 -> 0.288877: 1.03x faster Significant (t=36.22) Stddev: 0.00149 -> 0.00143: 1.0435x smaller
fastpickle
Min: 0.458961 -> 0.475828: 1.04x slower Avg: 0.460226 -> 0.481228: 1.05x slower Significant (t=-109.38) Stddev: 0.00082 -> 0.00173: 2.1051x larger
nqueens
Min: 0.305883 -> 0.295858: 1.03x faster Avg: 0.308085 -> 0.297755: 1.03x faster Significant (t=90.22) Stddev: 0.00077 -> 0.00085: 1.0942x larger
silent_logging
Min: 0.074152 -> 0.075818: 1.02x slower Avg: 0.074345 -> 0.076005: 1.02x slower Significant (t=-96.29) Stddev: 0.00013 -> 0.00012: 1.0975x smaller
spectral_norm
Min: 0.355738 -> 0.364419: 1.02x slower Avg: 0.356691 -> 0.365764: 1.03x slower Significant (t=-126.23) Stddev: 0.00054 -> 0.00047: 1.1533x smaller
telco
Min: 0.012152 -> 0.013038: 1.07x slower Avg: 0.012264 -> 0.013157: 1.07x slower Significant (t=-83.98) Stddev: 0.00008 -> 0.00007: 1.0653x smaller
The following not significant results are hidden, use -v to show them: 2to3, call_method, chaos, django_v2, etree_parse, fannkuch, fastunpickle, float, formatted_logging, go, hexiom2, iterative_count, json_dump, json_dump_v2, json_load, mako, mako_v2, meteor_contest, nbody, normal_startup, pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, regex_effbot, regex_v8, richards, simple_logging, startup_nosite, threaded_count, tornado_http, unpack_sequence, unpickle_list.
Author: Stefan Krah (skrah) *
Date: 2014-07-28 14:23
Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge.
telco.py always varies a lot between runs (up to 10%), even in the big version "telco.py full":
http://bytereef.org/mpdecimal/quickstart.html#telco-benchmark
Using the average of 10 runs, I can't really see a slowdown.
Author: Stefan Krah (skrah) *
Date: 2014-07-28 14:26
So I wonder why the benchmark suite says that the telco slowdown is significant. :)
Author: David Wilson (dw) *
Date: 2014-07-29 18:24
I suspect it's all covered now, but is there anything else I can help with to get this patch pushed along its merry way?
Author: Roundup Robot (python-dev)
Date: 2014-07-29 23:45
New changeset 79a5fbe2c78f by Antoine Pitrou in branch 'default': Issue #22003: When initialized from a bytes object, io.BytesIO() now http://hg.python.org/cpython/rev/79a5fbe2c78f
Author: Antoine Pitrou (pitrou) *
Date: 2014-07-29 23:46
The latest patch is good indeed. Thank you very much!
Author: Mikhail Korobov (kmike)
Date: 2015-02-09 08:13
Shouldn't this fix be mentioned in https://docs.python.org/3.5/whatsnew/3.5.html#optimizations ?
Author: David Wilson (dw) *
Date: 2015-02-09 15:57
Attached trivial patch for whatsnew.rst.
Author: Roundup Robot (python-dev)
Date: 2015-02-14 22:45
New changeset 7ae156f07a90 by Berker Peksag in branch 'default': Add a whatsnew entry for issue #22003. https://hg.python.org/cpython/rev/7ae156f07a90
Author: Piotr Dobrogost (piotr.dobrogost)
Date: 2015-03-04 20:12
This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.
Why does it abandon buffer interface? Because of the following?
Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.
Shouldn't existing buffer implementations be fixed then and this feature made to use buffer interface instead of specialize for Bytes? If so is there at least any information on this in the comments so that one wouldn't wonder why there is specialization instead of relaying on buffer interface?
Author: David Wilson (dw) *
Date: 2015-03-04 21:04
Hi Piotr,
There wasn't an obvious fix that didn't involve changing the buffer interface itself. There is presently ambiguity in the interface regarding the difference between a "read only" buffer and an "immutable" buffer, which is crucial for its use in this case.
Fixing the interface, followed by every buffer interface user, is a significantly more complicated task than simply optimizing for the most common case, as done here. FWIW I still think this work is worth doing, though I personally don't have time to approach it just now.
We could have (and possibly should) approach fixing e.g. mmap.mmap() hashability, possibly causing user code regressions, but even if such cases were fixed it still wouldn't be a enough to rely on for the optimization implemented here.
History
Date
User
Action
Args
2022-04-11 14:58:06
admin
set
github: 66202
2015-03-04 21:04:29
dw
set
messages: +
2015-03-04 20:12:27
piotr.dobrogost
set
nosy: + piotr.dobrogost
messages: +
2015-02-14 22:45:31
python-dev
set
messages: +
2015-02-09 15:57:14
dw
set
files: + whatsnew.diff
messages: +
2015-02-09 08:13:42
kmike
set
messages: +
2014-07-29 23:46:30
pitrou
set
status: open -> closed
resolution: fixed
messages: +
stage: patch review -> resolved
2014-07-29 23:45:46
python-dev
set
nosy: + python-dev
messages: +
2014-07-29 18:24:10
dw
set
messages: +
2014-07-28 14:26:38
skrah
set
messages: +
2014-07-28 14:23:08
skrah
set
messages: +
2014-07-28 12:30:29
dw
set
files: + cow6.patch
messages: +
2014-07-27 15:48:35
pitrou
set
messages: +
2014-07-27 12:05:27
dw
set
messages: +
2014-07-25 16:27:10
pitrou
set
stage: needs patch -> patch review
2014-07-25 16:26:57
pitrou
set
messages: +
2014-07-24 22:56:16
dw
set
files: + cow5.patch
messages: +
2014-07-22 23:19:56
pitrou
set
messages: +
2014-07-22 20:31:39
dw
set
messages: +
2014-07-22 19:37:26
skrah
set
messages: +
2014-07-22 19:12:25
skrah
set
messages: +
2014-07-22 18:50:01
pitrou
set
messages: +
2014-07-22 07:15:42
serhiy.storchaka
set
messages: +
2014-07-21 22:03:44
dw
set
files: + cow4.patch
2014-07-21 22:03:25
dw
set
messages: +
2014-07-21 19:09:33
skrah
set
messages: +
2014-07-21 18:24:41
dw
set
messages: +
2014-07-21 17:02:08
dw
set
messages: +
2014-07-21 16:23:01
skrah
set
messages: +
2014-07-21 16:04:47
skrah
set
messages: +
2014-07-21 15:31:47
pitrou
set
nosy: + skrah
messages: +
2014-07-21 15:31:02
pitrou
set
messages: +
2014-07-20 22:07:21
dw
set
messages: +
2014-07-20 17:55:53
dw
set
files: + cow3.patch
messages: +
2014-07-20 16:50:34
dw
set
files: + cow2.patch
messages: +
2014-07-18 16:37:49
kmike
set
nosy: + kmike
2014-07-18 08:42:32
scoder
set
nosy: + scoder
messages: +
2014-07-18 07:12:03
dw
set
messages: +
2014-07-17 22:46:37
pitrou
set
nosy: + serhiy.storchaka
messages: +
stage: needs patch
2014-07-17 22:36:49
pitrou
set
nosy: + pitrou, benjamin.peterson, stutzbach, hynek
2014-07-17 22:30:55
dw
set
messages: +
2014-07-17 22:25:37
dw
create