Issue 23352: Document "suboffsets if needed" in 2.7 (original) (raw)
Created on 2015-01-30 01:33 by rhansen, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (23)
Author: Richard Hansen (rhansen) *
Date: 2015-01-30 01:33
According to https://docs.python.org/2/c-api/buffer.html#the-new-style-py-buffer-struct if the suboffsets member of Py_buffer is non-NULL and all members of the array are negative, the buffer may be contiguous.
PyBuffer_IsContiguous() does not behave this way. It assumes that if the suboffsets member is non-NULL then at least one member of the array is non-negative, and thus assumes that the buffer is non-contiguous. This is not always correct.
One consequence of this bug is PyBuffer_ToContiguous() (and thus memoryview.tobytes()) falls back to slow (and currently buggy, see issue #23349) memory copying code.
Attached is a patch that fixes this bug. The patch is against 2.7 but can be trivially modified to apply to 3.x.
Author: Antoine Pitrou (pitrou) *
Date: 2015-01-30 01:39
The patch has an obvious syntax error :-) Other than that, adding a comment would be nice. Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).
Author: Richard Hansen (rhansen) *
Date: 2015-01-30 01:42
The patch has an obvious syntax error :-)
Doh!
Other than that, adding a comment would be nice.
Agreed, will do.
Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).
I'll take a look. Thanks!
Author: Richard Hansen (rhansen) *
Date: 2015-01-30 05:45
I've attached a new version of the patch. Suggestions for simplifying the test code would be appreciated.
Author: Stefan Krah (skrah) *
Date: 2015-01-30 13:10
PEP-3118 says:
Py_buffer.suboffsets:
"If all suboffsets are negative (i.e. no de-referencing is needed, then this must be NULL (the default value)."
I would be inclined to go with the PEP here and make a doc fix instead.
Author: Stefan Krah (skrah) *
Date: 2015-01-30 14:29
For the record: I'm myself guilty of accepting all-negative suboffsets in memoryview.c (see init_slice()) and I think there's even a test for it.
However, while it's ok for a specific function to accept this corner case, I'm not sure about is_contiguous().
People might rely on the fact that contiguous implies suboffsets==NULL.
Sebastian, did this special case ever come up in the context of NumPy?
Author: Richard Hansen (rhansen) *
Date: 2015-01-30 20:10
People might rely on the fact that contiguous implies suboffsets==NULL.
Cython (currently) relies on all-negatives being acceptable and equivalent to suboffsets==NULL. See: https://github.com/cython/cython/pull/367 http://thread.gmane.org/gmane.comp.python.cython.devel/15569
Author: Sebastian Berg (seberg) *
Date: 2015-01-30 20:36
Numpy does not understand suboffsets. The buffers we create will always have them NULL. The other way around.... To be honest, think it is probably ignoring the whole fact that they might exist at all :/, really needs to be fixed if it is the case.
Author: Stefan Krah (skrah) *
Date: 2015-01-31 13:03
Cython doesn't follow the spec though (use Python 3):
from _testbuffer import *
cpdef foo(): cdef unsigned char[:] v = bytearray(b"testing") nd = ndarray(v, getbuf=PyBUF_ND) print(nd.suboffsets) nd = ndarray(v, getbuf=PyBUF_FULL) print(nd.suboffsets)
Cython hands out suboffsets even for a PyBUF_ND request, which is definitely wrong. For a PyBUF_FULL request they should only be provided if needed.
Author: Stefan Krah (skrah) *
Date: 2015-01-31 14:53
To summarize, I think this is different from #22445, which also requests relaxed contiguity tests. #22445 is motivated by the fact that slicing can create a certain class of contiguous buffers that aren't recognized as such.
No such reason exists here: All-negative suboffsets are only created if a buffer provider chooses to hand them out instead of NULL. To my knowledge only Cython does this, and it's against the PEP (though strictly speaking not against the docs).
So I'm -0.5 on this change in general and -1 for 2.7.
I'd reject the issue, but let's wait for Antoine's opinion.
Author: Stefan Krah (skrah) *
Date: 2015-01-31 17:53
Prepare for a partial reversal of opinion. :)
Indexing can actually create all-negative suboffsets in a natural way:
from _testbuffer import * nd = ndarray([1,2,3,4,5,6,7,8,9,10,11,12], shape=[3,4], flags=ND_PIL) nd.tolist() [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]
x = nd[1] x.tolist() [5, 6, 7, 8] x.suboffsets (-1,)
This leaves me +-0 for the change, with the caveat that applications might break.
Author: Antoine Pitrou (pitrou) *
Date: 2015-01-31 18:51
I don't have an opinion on this. I've never seen suboffsets in use; but it seems reasonable to detect a "dummy suboffsets" array and recognize it as contiguous.
Author: Richard Hansen (rhansen) *
Date: 2015-02-01 02:29
(The following message is mostly off-topic but I think it is relevant to those interested in this issue. This message is about the clarity of the documentation regarding flag semantics, and what I think the flags should mean.)
Cython doesn't follow the spec though (use Python 3):
from _testbuffer import * cpdef foo(): cdef unsigned char[:] v = bytearray(b"testing") nd = ndarray(v, getbuf=PyBUF_ND) print(nd.suboffsets) nd = ndarray(v, getbuf=PyBUF_FULL) print(nd.suboffsets)
When I compile and run the above (latest Cython from Git master), I get:
()
()
Looking at the Python source code (Modules/_testbuffer.c ndarray_get_suboffsets() and ssize_array_as_tuple()) the above output is printed if the suboffsets member is NULL.
Cython hands out suboffsets even for a PyBUF_ND request, which is definitely wrong. For a PyBUF_FULL request they should only be provided if needed.
suboffsets appears to be NULL in both cases in your example, which seems acceptable to me given:
- the user didn't request suboffsets information in the PyBUF_ND case, and
- there are no indirections in the PyBUF_FULL case.
Even if suboffsets wasn't NULL, I believe the behavior would still be correct for the PyBUF_ND case. My reading of <https://docs.python.org/3/c-api/buffer.html> is that flags=PyBUF_ND implies that shape member MUST NOT be NULL, but strides and suboffsets can be NULL or not (it doesn't matter). This interpretation of mine is due to the sentence, "Note that each flag contains all bits of the flags below it." If flags=PyBUF_ND meant that strides and suboffsets MUST be NULL, then PyBUF_ND and PyBUF_STRIDES would necessarily be mutually exclusive and flags=(PyBUF_ND|PyBUF_STRIDES) would not make sense (the strides member would have to be both NULL and non-NULL).
If (flags & PyBUF_INDIRECT) is false, then the consumer is not interested in the suboffsets member so it shouldn't matter what it points to. (And the consumer should not dereference the pointer in case it points to a junk address.)
IMHO, if the buffer is C-style contiguous then the producer should be allowed to populate the shape, strides, and suboffsets members regardless of whether or not any of the PyBUF_ND, PyBUF_STRIDES, or PyBUF_INDIRECT flags are set. In other words, for C-style contiguous buffers, the producer should be allowed to act as if PyBUF_INDIRECT was always provided because the consumer will always get an appropriate Py_buffer struct regardless of the actual state of the PyBUF_ND, PyBUF_STRIDES, and PyBUF_INDIRECT flags.
It is a bug, however, to dereference the strides or suboffsets members with ndarray(v, getbuf=PyBUF_ND) because the consumer didn't ask for strides or suboffsets information and the pointers might be bogus.
Stepping back a bit, I believe that the flags should be thought of as imposing requirements on the producer. I think those requirements should be (assuming ndim > 0):
- PyBUF_ND: If (flags & PyBUF_ND) is true:
- If (flags & PyBUF_STRIDES) is false and the producer is unable to produce a block of memory at [buf,buf+len) containing all (len/itemsize) entries in a C-style contiguous chunk, then the producer must raise an exception.
- Otherwise, the producer must provide the shape of buffer. If (flags & PyBUF_ND) is false:
- If the producer is unable to produce a contiguous chunk of (len/itemsize) entries (of any shape) in the block of memory at [buf,buf+len), then the producer must raise an exception.
- Otherwise, the producer is permitted to do any of the
following:
- don't touch the shape member (don't set it to NULL or any other value; just leave it as-is)
- set the shape member to NULL
- set the shape member to point to an array describing the shape
- set the shape member to point to any other location, even if dereferencing the pointer would result in a segfault
- PyBUF_STRIDES: If (flags & PyBUF_STRIDES) is true:
- The producer must provide the appropriate strides information. If (flags & PyBUF_STRIDES) is false:
- If the producer is unable to produce a block of memory at [buf,buf+len) containing all (len/itemsize) entries, the producer must raise an exception.
- Otherwise, the producer is permitted to do any of the
following;
- don't touch the strides member (don't set it to NULL or any other value; just leave it as-is)
- set the strides member to NULL
- set the strides member to point to an array describing the strides
- set the strides member to point to any other location, even if dereferencing the pointer would result in a segfault
- PyBUF_INDIRECT: If (flags & PyBUF_INDIRECT) is true:
- If the buffer uses indirections then the producer must set the suboffsets member to point to an array with appropriate entries.
- Otherwise, the producer can either set the suboffsets member to NULL or set it to point to an array of all negative entries. If (flags & PyBUF_INDIRECT) is false:
- If the producer cannot produce a buffer that does not have indirections, then the producer must raise an exception.
- Otherwise, the producer is permitted to do any of the
following:
- don't touch the suboffsets member (don't set it to NULL or any other value; just leave it as-is)
- set the suboffsets member to NULL
- set the suboffsets member to point to an array of all negative entries
- set the suboffsets member to point to any other location, even if dereferencing the pointer would result in a segfault
Author: Richard Hansen (rhansen) *
Date: 2015-02-01 07:57
Attached is a documentation patch that adds what I said in . I doubt everyone will agree with the changes, but maybe it will be a useful starting point.
(Despite not having an asterisk next to my username, I have signed the contributor agreement. It just hasn't been processed yet.)
Author: Richard Hansen (rhansen) *
Date: 2015-02-01 08:45
This leaves me +-0 for the change, with the caveat that applications might break.
How might an application break with this change? Compared to the current PyBuffer_IsContiguous(), the patched version is the same except it returns true for a wider range of actually-contiguous buffers.
Author: Stefan Krah (skrah) *
Date: 2015-02-01 08:59
Richard Hansen added the comment:
Cython doesn't follow the spec though (use Python 3):
from _testbuffer import * cpdef foo(): cdef unsigned char[:] v = bytearray(b"testing") nd = ndarray(v, getbuf=PyBUF_ND) print(nd.suboffsets) nd = ndarray(v, getbuf=PyBUF_FULL) print(nd.suboffsets)
When I compile and run the above (latest Cython from Git master), I get:
() ()
With Cython version 0.20.1post0 I get:
foo.foo() (-1,) (-1,)
If you get the correct output from the latest Cython, it looks like this issue has been fixed.
Author: Stefan Krah (skrah) *
Date: 2015-02-01 09:10
How might an application break with this change?
func1: if (!PyBuffer_IsContiguous(view, 'C')) error();
func2(view);
func2: assert(view->suboffsets == NULL); ...
Yes, I did insert sanity checks like this in some places.
Author: Richard Hansen (rhansen) *
Date: 2015-02-01 09:21
When I compile and run the above (latest Cython from Git master), I get:
() ()
With Cython version 0.20.1post0 I get:
foo.foo() (-1,) (-1,)
If you get the correct output from the latest Cython, it looks like this issue has been fixed.
Oops, I was running a locally modified version of Cython that contained a patch meant to work around issue #23349.
When I run the actual upstream master I get the same behavior that you do.
But either way, I don't see why it's a problem that it prints (-1,) for the PyBUF_ND case. The consumer didn't request suboffsets information so I would expect it to be OK for the producer to put whatever it wants in the suboffsets field, including a junk pointer that would segfault upon dereference.
Author: Richard Hansen (rhansen) *
Date: 2015-02-01 09:23
How might an application break with this change?
assert(view->suboffsets == NULL);
Fair point.
Author: Sebastian Berg (seberg) *
Date: 2015-02-01 09:29
I do not have an opinion either way, but I do think there is a small difference here compared to the other issue. With the other issue, there are cases where we cannot set the strides correctly.
If you ask numpy directly whether the array is contiguous (or create it from numpy) and it says "yes", and then you get it without asking for a contiguous array it is, without the change, possible to get an array that would not be considered contiguous.
On the other hand, setting suboffsets to NULL when all are -1 is always possible, if tedious I admit.
Author: Richard Hansen (rhansen) *
Date: 2015-02-01 09:39
My preference is to apply the patch, of course. There is a legitimate concern that it will break existing code, but I think there are more points in favor of applying the patch:
- there exists code that the current behavior is known to break
- it's easier to remove assert(view->suboffsets == NULL) than to change producers to not provide an all-negative array (though I admit there may be other more subtle ways the patch would break existing code)
- I may be the only one who has spoken up about this, but I doubt I'm the only one who has experienced it
Author: Stefan Krah (skrah) *
Date: 2015-02-01 11:36
But only Cython does not set suboffsets to NULL and you already have a small patch to fix that.
The Python 3 docs say "suboffsets only if needed" and the PEP says the same, so the situation is not completely undocumented.
I think your doc patch goes too far. Buffers propagate in unpredictable ways, and since the original consumer request is not stored in the buffer, setting unneeded fields to NULL provides a way to figure out the original request.
It is actually an optimization to set suboffsets to NULL: Compliant code that uses arbitrary buffers always has to check for:
if (suboffsets != NULL && suboffsets[i] >= 0)
...
If suboffsets are consistently NULL, at least hopefully you get branch prediction to kick in.
As Sebastian pointed out, it's relatively easy even for slicing/indexing functions to check for all-negative suboffsets and handle that case.
All this is probably academic, since no one appears to be using suboffsets at all. :)
Let's keep the status-quo and make this a doc issue for 2.7.
Author: Roundup Robot (python-dev)
Date: 2015-02-01 18:50
New changeset c4c1d68b6301 by Stefan Krah in branch '2.7': Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets https://hg.python.org/cpython/rev/c4c1d68b6301
New changeset de5c8ee002bf by Stefan Krah in branch '3.4': Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets https://hg.python.org/cpython/rev/de5c8ee002bf
New changeset 5d097a74766f by Stefan Krah in branch 'default': Issue #23352: Merge from 3.4. https://hg.python.org/cpython/rev/5d097a74766f
History
Date
User
Action
Args
2022-04-11 14:58:12
admin
set
github: 67541
2015-02-02 17:05:41
skrah
set
status: open -> closed
resolution: fixed
stage: resolved
2015-02-01 18:50:57
python-dev
set
nosy: + python-dev
messages: +
2015-02-01 11:36:19
skrah
set
priority: normal -> low
type: behavior -> enhancement
assignee: docs@python
components: + Documentation, - Interpreter Core
title: PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL -> Document "suboffsets if needed" in 2.7
nosy: + docs@python
messages: +
stage: patch review -> (no value)
2015-02-01 09:39:42
rhansen
set
messages: +
2015-02-01 09:29:13
seberg
set
messages: +
2015-02-01 09:23:25
rhansen
set
messages: +
2015-02-01 09:21:18
rhansen
set
messages: +
2015-02-01 09:10:04
skrah
set
messages: +
2015-02-01 08:59:39
skrah
set
messages: +
2015-02-01 08:45:35
rhansen
set
messages: +
2015-02-01 08:14:30
rhansen
set
files: + doc_c-api_buffer.patch
2015-02-01 08:14:14
rhansen
set
files: - doc_c-api_buffer.patch
2015-02-01 08:05:41
rhansen
set
files: + doc_c-api_buffer.patch
2015-02-01 08:05:16
rhansen
set
files: - doc_c-api_buffer.patch
2015-02-01 07:57:34
rhansen
set
files: + doc_c-api_buffer.patch
messages: +
2015-02-01 02:29:22
rhansen
set
messages: +
2015-01-31 18:51:52
pitrou
set
messages: +
2015-01-31 17:53:23
skrah
set
messages: +
2015-01-31 14:53:58
skrah
set
messages: +
2015-01-31 13:03:16
skrah
set
messages: +
2015-01-30 20:36:31
seberg
set
messages: +
2015-01-30 20:10:03
rhansen
set
messages: +
2015-01-30 14:29:25
skrah
set
nosy: + seberg
messages: +
2015-01-30 13:10:57
skrah
set
messages: +
2015-01-30 05:52:11
rhansen
set
versions: - Python 3.2, Python 3.3
2015-01-30 05:51:37
rhansen
set
versions: + Python 3.2, Python 3.3
2015-01-30 05:45:06
rhansen
set
files: + PyBuffer_IsContiguous_v2.patch
messages: +
2015-01-30 01:42:25
rhansen
set
messages: +
2015-01-30 01:39:50
pitrou
set
versions: - Python 3.2, Python 3.3, Python 3.6
nosy: + skrah, pitrou
messages: +
stage: patch review
2015-01-30 01:35:07
rhansen
set
type: behavior
2015-01-30 01:33:45
rhansen
create