Issue 22003: BytesIO copy-on-write - Python tracker (original) (raw)

Created on 2014-07-17 22:25 by dw, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (37)

Author: David Wilson (dw) *

Date: 2014-07-17 22:25

This is a followup to the thread at https://mail.python.org/pipermail/python-dev/2014-July/135543.html , discussing the existing behaviour of BytesIO copying its source object, and how this regresses compared to cStringIO.StringI.

The goal of posting the patch on list was to try and stimulate discussion around the approach. The patch itself obviously isn't ready for review, and I'm not in a position to dedicate time to it just now (although in a few weeks I'd love to give it full attention!).

Ignoring this quick implementation, are there any general comments around the approach?

My only concern is that it might keep large objects alive in a non-intuitive way in certain circumstances, though I can't think of any obvious ones immediately.

Also interested in comments on the second half of that thread: "a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer."

There are quite a few interactions with making that work correctly, in particular:

How BytesIO would implement the buffers interface without causing the under-construction Bytes to become readonly
Avoiding redundant copies and resizes -- we can't simply tack 25% slack on the end of the Bytes and then truncate it during getvalue() without likely triggering a copy and move, however with careful measurement of allocator behavior there are various tradeoffs that could be made - e.g. obmalloc won't move a <500 byte allocation if it shrinks by <25%. glibc malloc's rules are a bit more complex though.

Could also add a private _PyBytes_SetSize() API to allow truncation to the final size during getvalue() without informing the allocator. Then we'd simply overallocate by up to 10% or 1-2kb, and write off the loss of the slack space.

Notably, this approach completely differs from the one documented in http://bugs.python.org/issue15381 .. it's not clear to me which is better.

msg223385 - (view)

Author: David Wilson (dw) *

Date: 2014-07-17 22:30

Submitted contributor agreement. Please consider the demo patch licensed under the Apache 2 licence.

msg223386 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-17 22:46

Be careful what happens when the original object is mutable:

b = bytearray(b"abc") bio = io.BytesIO(b) b[:] = b"defghi" bio.getvalue() b'abc'

I don't know what your patch does in this case.

msg223401 - (view)

Author: David Wilson (dw) *

Date: 2014-07-18 07:12

Good catch :( There doesn't seem to be way a to ask for an immutable buffer, so perhaps it could just be a little more selective. I think the majority of use cases would still be covered if the sharing behaviour was restricted only to BytesType.

In that case "Py_buffer initialdata" could become a PyObject*, saving a small amount of memory, and allowing reuse of the struct member if BytesIO was also modified to directly write into a private BytesObject

msg223402 - (view)

Author: Stefan Behnel (scoder) * (Python committer)

Date: 2014-07-18 08:42

Even if there is no way to explicitly request a RO buffer, the Py_buffer struct that you get back actually tells you if it's read-only or not. Shouldn't that be enough to enable this optimisation?

Whether or not implementors of the buffer protocol set this flag correctly is another question, but if not then they need fixing on their side anyway. (And in the vast majority of cases, the implementor will be either CPython or NumPy.)

Also, generally speaking, I think such an optimisation would be nice, even if it only catches some common cases (and doesn't break the others :). It could still copy data if necessary, but try to avoid it if possible.

msg223522 - (view)

Author: David Wilson (dw) *

Date: 2014-07-20 16:50

This version is tidied up enough that I think it could be reviewed.

Changes are:

Defer `buf' allocation until init, rather than new as was previously done. Now upon completion, BytesIO.new returns a valid, closed BytesIO, whereas previously a valid, empty, open BytesIO was returned. Is this interface change permissible?
Move init guts into a "reinit()", for sharing with setstate, which also previously caused an unnecessary copy. Additionally gather up various methods for deallocating buffers into a single "reset()" function, called by reinit(), _dealloc(), and _close()
Per Stefan's suggested approach, reinit() now explicitly checks for a read-only buffer, falling back to silently performing a copy if the returned buffer is read-write. That seems vastly preferable to throwing an exception, which would probably be another interface change.
Call unshare() any place the buffer is about to be modified. If the buffer needs to be made private, it also accepts a size hint indicating how much less/more space the subsequent operation needs, to avoid a redundant reallocation after the unsharing.

Outstanding issues:

I don't understand why buf_size is a size_t, and I'm certain the casting in unshare() is incorrect somehow. Is it simply to avoid signed overflow?

msg223526 - (view)

Author: David Wilson (dw) *

Date: 2014-07-20 17:55

New patch also calls unshare() during getbuffer()

msg223542 - (view)

Author: David Wilson (dw) *

Date: 2014-07-20 22:07

I'm not sure the "read only buffer" test is strong enough: having a readonly view is not a guarantee that the data in the view cannot be changed through some other means, i.e. it is read-only, not immutable.

Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?

msg223581 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-21 15:31

Pretty sure this approach is broken. What about the alternative approach of specializing for Bytes?

That would certainly sound good enough, to optimize the common case.

Also, it would be nice if you could add some tests to the patch (e.g. to stress the bytearray case). Thank you!

msg223582 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-21 15:31

As for whether the "checking for a readonly view" approach is broken, I don't know: that part of the buffer API is still mysterious to me. Stefan, would you have some insight?

msg223583 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-21 16:04

I think checking for a readonly view is fine. The protocol is this:

Use the PyBUF_WRITABLE flag in the request. Then the provider must either have a writable buffer or else deny the request entirely.
Omit the PyBUF_WRITABLE flag in the request. Then the provider can return a writable or a readonly buffer, but must set the readonly flag correctly AND export the same type of buffer to ALL consumers.

It is not possible to ask for a readonly buffer explicitly, but the readonly flag in the Py_Buffer struct should always be set correctly.

It is hard to guess the original intention of the PEP-3118 authors, but in practice "readonly" means "immutable" here. IMO a buffer provider would be seriously broken if a readonly buffer is mutated in any way.

msg223586 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-21 16:23

The original wording in the PEP is this:

readonly

an integer variable to hold whether or not the memory is readonly. 1
means the memory is readonly, zero means the memory is writable.

To me this means that a hypothetical compiler that could figure out at compile time that the readonly flag is set would be allowed to put the buffer contents into the read-only data section.

msg223588 - (view)

Author: David Wilson (dw) *

Date: 2014-07-21 17:02

Stefan,

Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.

For example, mmap.mmap(..., flags=mmap.MAP_SHARED, prot=mmap.PROT_READ) will produce a read-only buffer, yet mutability is entirely at the whim of the operating system. In this case, "immutability" may be apparent for years, until some machine has memory pressure, causing the shared mapping to be be flushed, and refreshed from (say, incoherent NFS storage) on next access.

I thought it would be worth auditing some of the most popular types of buffer just to check your interpretation, and this was the first, most obvious candidate.

Any thoughts? I'm leaning heavily toward the Bytes specialization approach

msg223599 - (view)

Author: David Wilson (dw) *

Date: 2014-07-21 18:24

I'm not sure how much work it would be, or even if it could be made sufficient to solve our problem, but what about extending the buffers interface to include a "int stable" flag, defaulting to 0?

It seems though, that it would just be making the already arcane buffers interface even more arcane simply for the benefit of our specific use case

msg223600 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-21 19:09

I'm sure many exporters aren't setting the right flags; on the other hand we already hash memoryviews based on readonly buffers, assuming they are immutable.

msg223611 - (view)

Author: David Wilson (dw) *

Date: 2014-07-21 22:03

Hi Stefan,

How does this approach in reinit() look? We first ask for a writable buffer, and if the object obliges, immediately copy it. Otherwise if it refused, ask for a read-only buffer, and this time expect that it will never change.

This still does not catch the case of mmap.mmap. I am not sure how do deal with mmap.mmap. There is no way for it to export PROT_READ as a read-only buffer without permitted mutation, so the only options seem to either be a) remove buffer support from mmap, or b) blacklist it in bytesio(!).

Antoine, I have padded out the unit tests a little. test_memoryio.py seems the best place for them. Also modified test_sizeof(), although to the way this test is designed seems inherently brittle to begin with. Now it is also sensitive to changes in Py_buffer struct.

Various other changes:

new once again returns a valid, open, empty BytesIO, since the alternative breaks pickling.
reinit() preserves existing BytesIO state until it knows it can succeed, which fixes another of the pickle tests.
setstate() had CHECK_CLOSED() re-added, again for the pickle tests.

Probably the patch guts could be rearranged again, since the definition of the functions is no longer as clear as it was in cow3.patch.

msg223633 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2014-07-22 07:15

See also .

msg223681 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-22 18:50

There's also the following code in numpy's getbuffer method:

/*
 * If a read-only buffer is requested on a read-write array, we return a
 * read-write buffer, which is dubious behavior. But that's why this call
 * is guarded by PyArray_ISWRITEABLE rather than (flags &
 * PyBUF_WRITEABLE).
 */
if (PyArray_ISWRITEABLE(self)) {
    if (array_might_be_written(self) < 0) {
        goto fail;
    }
}

... which seems to imply that mmap is not the only one with "dubious behaviour" (?).

msg223682 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-22 19:12

Actually we have an extra safety net in memory_hash() apart from the readonly check: We also check if the underlying object is hashable.

This might be applicable here, too. Unfortunately mmap objects are hashable, leading to some funny results:

import mmap with open("hello.txt", "wb") as f: ... f.write(b"xxxxx\n") ... 6 f = open("hello.txt", "r+b") mm = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ) x = memoryview(mm) hash(mm) -9223363309538046107 hash(x) -3925142568057840789 x.tolist() [120, 120, 120, 120, 120, 10]

with open("hello.txt", "wb") as g: ... g.write(b"yyy\n") ... 4 hash(mm) -9223363309538046107 hash(x) -3925142568057840789 x.tolist() [121, 121, 121, 10, 0, 0]

memoryview (rightfully) assumes that hashable objects are immutable and caches the first hash.

I'm not sure why mmap objects are hashable, it looks like a bug to me.

msg223683 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-22 19:37

I think the mmap behavior is probably worse than the NumPy example.

I assume that in the example the exporter sets view.readonly=0. mmap objects set view.readonly=1 and can still be mutated.

msg223692 - (view)

Author: David Wilson (dw) *

Date: 2014-07-22 20:31

Stefan, I like your new idea. If there isn't some backwards compatibility argument about mmap.mmap being hashable, then it could be considered a bug, and fixed in the same hypothetical future release that includes this BytesIO change. The only cost now is that to test for hashability, we must hash the object, which causes every byte in it to be touched (aka. almost 50% the cost of a copy)

If however we can't fix mmap.mmap due to the interface change (I think that's a silly idea -- Python has never been about letting the user shoot themselves in the foot), then the specialized-for-Bytes approach is almost as good (and perhaps even better, since the resulting concept and structure layout is more aligned with Serhiy's patch in ).

tl;dr:

a) mmap.mmap can be fixed - use hashability as strong test for immutability (instead of ugly heuristic involving buffer blags)

undecided: is calling hash(obj) to check for immutability too costly?

b) mmap.mmap can't be fixed - use the Bytes specialization approach.

msg223707 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-22 23:19

I don't like the idea of trying to hash the object. It may be a time-consuming operation, while the result will be thrown away.

I think restricting the optimization to bytes objects is fine. We can whitelist other types, such as memoryview.

msg223907 - (view)

Author: David Wilson (dw) *

Date: 2014-07-24 22:56

This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.

Anyone care to glance at least at the general structure?

Tests could probably use a little more work.

Microbenchmark seems fine, at least for construction. It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)

cpython] ./python.exe -m timeit -s 'import i' 'i.readlines()' lines: 54471 100 loops, best of 3: 13.3 msec per loop

[23:52:55 eldil!58 cpython] ./python-nocow -m timeit -s 'import i' 'i.readlines()' lines: 54471 10 loops, best of 3: 19.6 msec per loop

[23:52:59 eldil!59 cpython] cat i.py import io word = b'word' line = (word * int(79/len(word))) + b'\n' ar = line * int((4 * 1048576) / len(line)) def readlines(): return len(list(io.BytesIO(ar))) print('lines: %s' % (readlines(),))

msg223962 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-25 16:26

It doesn't seem likely this patch would introduce severe performance troubles elsewhere, but I'd like to trying it out with some example heavy BytesIO consumers (any suggestions? Some popular template engine?)

I don't have any specific suggestions, but you could try the benchmark suite here: http://hg.python.org/benchmarks

msg224121 - (view)

Author: David Wilson (dw) *

Date: 2014-07-27 12:05

Hey Antoine,

Thanks for the link. I'm having trouble getting reproducible results at present, and running out of ideas as to what might be causing it. Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.

I could not detect any significant change between runs of the old and new binary that could not be described as noise, given the isolation issues above.

msg224139 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-27 15:48

Even after totally isolating a CPU for e.g. django_v2 and with frequency scaling disabled, numbers still jump around for the same binary by as much as 3%.

That's expected. If the difference doesn't go above 5-10%, then you IMO can pretty much consider your patch didn't have any impact on those benchmarks.

msg224164 - (view)

Author: David Wilson (dw) *

Date: 2014-07-28 12:30

Newest patch incorporates Antoine's review comments. The final benchmark results are below. Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge

Report on Linux k2 3.14-1-amd64 #1 SMP Debian 3.14.9-1 (2014-06-30) x86_64 Total CPU cores: 4

call_method_slots

Min: 0.329869 -> 0.340487: 1.03x slower Avg: 0.330512 -> 0.341786: 1.03x slower Significant (t=-216.69) Stddev: 0.00067 -> 0.00060: 1.1111x smaller

call_method_unknown

Min: 0.351167 -> 0.343961: 1.02x faster Avg: 0.351731 -> 0.344580: 1.02x faster Significant (t=238.89) Stddev: 0.00033 -> 0.00040: 1.2271x larger

call_simple

Min: 0.257487 -> 0.277366: 1.08x slower Avg: 0.257942 -> 0.277809: 1.08x slower Significant (t=-845.64) Stddev: 0.00029 -> 0.00029: 1.0126x smaller

etree_generate

Min: 0.377985 -> 0.365952: 1.03x faster Avg: 0.381797 -> 0.369452: 1.03x faster Significant (t=31.15) Stddev: 0.00314 -> 0.00241: 1.3017x smaller

etree_iterparse

Min: 0.545668 -> 0.565437: 1.04x slower Avg: 0.554188 -> 0.576807: 1.04x slower Significant (t=-17.00) Stddev: 0.00925 -> 0.00956: 1.0340x larger

etree_process

Min: 0.294158 -> 0.286617: 1.03x faster Avg: 0.296354 -> 0.288877: 1.03x faster Significant (t=36.22) Stddev: 0.00149 -> 0.00143: 1.0435x smaller

fastpickle

Min: 0.458961 -> 0.475828: 1.04x slower Avg: 0.460226 -> 0.481228: 1.05x slower Significant (t=-109.38) Stddev: 0.00082 -> 0.00173: 2.1051x larger

nqueens

Min: 0.305883 -> 0.295858: 1.03x faster Avg: 0.308085 -> 0.297755: 1.03x faster Significant (t=90.22) Stddev: 0.00077 -> 0.00085: 1.0942x larger

silent_logging

Min: 0.074152 -> 0.075818: 1.02x slower Avg: 0.074345 -> 0.076005: 1.02x slower Significant (t=-96.29) Stddev: 0.00013 -> 0.00012: 1.0975x smaller

spectral_norm

Min: 0.355738 -> 0.364419: 1.02x slower Avg: 0.356691 -> 0.365764: 1.03x slower Significant (t=-126.23) Stddev: 0.00054 -> 0.00047: 1.1533x smaller

telco

Min: 0.012152 -> 0.013038: 1.07x slower Avg: 0.012264 -> 0.013157: 1.07x slower Significant (t=-83.98) Stddev: 0.00008 -> 0.00007: 1.0653x smaller

The following not significant results are hidden, use -v to show them: 2to3, call_method, chaos, django_v2, etree_parse, fannkuch, fastunpickle, float, formatted_logging, go, hexiom2, iterative_count, json_dump, json_dump_v2, json_load, mako, mako_v2, meteor_contest, nbody, normal_startup, pathlib, pickle_dict, pickle_list, pidigits, raytrace, regex_compile, regex_effbot, regex_v8, richards, simple_logging, startup_nosite, threaded_count, tornado_http, unpack_sequence, unpickle_list.

msg224168 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-28 14:23

Just curious, what causes e.g. telco to differ up to 7% between runs? That's really huge.

telco.py always varies a lot between runs (up to 10%), even in the big version "telco.py full":

http://bytereef.org/mpdecimal/quickstart.html#telco-benchmark

Using the average of 10 runs, I can't really see a slowdown.

msg224169 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2014-07-28 14:26

So I wonder why the benchmark suite says that the telco slowdown is significant. :)

msg224238 - (view)

Author: David Wilson (dw) *

Date: 2014-07-29 18:24

I suspect it's all covered now, but is there anything else I can help with to get this patch pushed along its merry way?

msg224273 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2014-07-29 23:45

New changeset 79a5fbe2c78f by Antoine Pitrou in branch 'default': Issue #22003: When initialized from a bytes object, io.BytesIO() now http://hg.python.org/cpython/rev/79a5fbe2c78f

msg224274 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2014-07-29 23:46

The latest patch is good indeed. Thank you very much!

msg235596 - (view)

Author: Mikhail Korobov (kmike)

Date: 2015-02-09 08:13

Shouldn't this fix be mentioned in https://docs.python.org/3.5/whatsnew/3.5.html#optimizations ?

msg235619 - (view)

Author: David Wilson (dw) *

Date: 2015-02-09 15:57

Attached trivial patch for whatsnew.rst.

msg236002 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2015-02-14 22:45

New changeset 7ae156f07a90 by Berker Peksag in branch 'default': Add a whatsnew entry for issue #22003. https://hg.python.org/cpython/rev/7ae156f07a90

msg237207 - (view)

Author: Piotr Dobrogost (piotr.dobrogost)

Date: 2015-03-04 20:12

This new patch abandons the buffer interface and specializes for Bytes per the comments on this issue.

Why does it abandon buffer interface? Because of the following?

Thanks for digging here. As much as I'd love to follow this interpretation, it simply doesn't match existing buffer implementations, including within the standard library.

Shouldn't existing buffer implementations be fixed then and this feature made to use buffer interface instead of specialize for Bytes? If so is there at least any information on this in the comments so that one wouldn't wonder why there is specialization instead of relaying on buffer interface?

msg237212 - (view)

Author: David Wilson (dw) *

Date: 2015-03-04 21:04

Hi Piotr,

There wasn't an obvious fix that didn't involve changing the buffer interface itself. There is presently ambiguity in the interface regarding the difference between a "read only" buffer and an "immutable" buffer, which is crucial for its use in this case.

Fixing the interface, followed by every buffer interface user, is a significantly more complicated task than simply optimizing for the most common case, as done here. FWIW I still think this work is worth doing, though I personally don't have time to approach it just now.

We could have (and possibly should) approach fixing e.g. mmap.mmap() hashability, possibly causing user code regressions, but even if such cases were fixed it still wouldn't be a enough to rely on for the optimization implemented here.

History

Date

User

Action

Args

2022-04-11 14:58:06

admin

set

github: 66202

2015-03-04 21:04:29

set

messages: +

2015-03-04 20:12:27

piotr.dobrogost

set

nosy: + piotr.dobrogost
messages: +

2015-02-14 22:45:31

python-dev

set

messages: +

2015-02-09 15:57:14

set

files: + whatsnew.diff

messages: +

2015-02-09 08:13:42

kmike

set

messages: +

2014-07-29 23:46:30

pitrou

set

status: open -> closed
resolution: fixed
messages: +

stage: patch review -> resolved

2014-07-29 23:45:46

python-dev

set

nosy: + python-dev
messages: +

2014-07-29 18:24:10

set

messages: +

2014-07-28 14:26:38

skrah

set

messages: +

2014-07-28 14:23:08

skrah

set

messages: +

2014-07-28 12:30:29

set

files: + cow6.patch

messages: +

2014-07-27 15:48:35

pitrou

set

messages: +

2014-07-27 12:05:27

set

messages: +

2014-07-25 16:27:10

pitrou

set

stage: needs patch -> patch review

2014-07-25 16:26:57

pitrou

set

messages: +

2014-07-24 22:56:16

set

files: + cow5.patch

messages: +

2014-07-22 23:19:56

pitrou

set

messages: +

2014-07-22 20:31:39

set

messages: +

2014-07-22 19:37:26

skrah

set

messages: +

2014-07-22 19:12:25

skrah

set

messages: +

2014-07-22 18:50:01

pitrou

set

messages: +

2014-07-22 07:15:42

serhiy.storchaka

set

messages: +

2014-07-21 22:03:44

set

files: + cow4.patch

2014-07-21 22:03:25

set

messages: +

2014-07-21 19:09:33

skrah

set

messages: +

2014-07-21 18:24:41

set

messages: +

2014-07-21 17:02:08

set

messages: +

2014-07-21 16:23:01

skrah

set

messages: +

2014-07-21 16:04:47

skrah

set

messages: +

2014-07-21 15:31:47

pitrou

set

nosy: + skrah
messages: +

2014-07-21 15:31:02

pitrou

set

messages: +

2014-07-20 22:07:21

set

messages: +

2014-07-20 17:55:53

set

files: + cow3.patch

messages: +

2014-07-20 16:50:34

set

files: + cow2.patch

messages: +

2014-07-18 16:37:49

kmike

set

nosy: + kmike

2014-07-18 08:42:32

scoder

set

nosy: + scoder
messages: +

2014-07-18 07:12:03

set

messages: +

2014-07-17 22:46:37

pitrou

set

nosy: + serhiy.storchaka

messages: +
stage: needs patch

2014-07-17 22:36:49

pitrou

set

nosy: + pitrou, benjamin.peterson, stutzbach, hynek

2014-07-17 22:30:55

set

messages: +

2014-07-17 22:25:37

create