Issue 23352: Document "suboffsets if needed" in 2.7 (original) (raw)

Created on 2015-01-30 01:33 by rhansen, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (23)

msg235014 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-01-30 01:33

According to https://docs.python.org/2/c-api/buffer.html#the-new-style-py-buffer-struct if the suboffsets member of Py_buffer is non-NULL and all members of the array are negative, the buffer may be contiguous.

PyBuffer_IsContiguous() does not behave this way. It assumes that if the suboffsets member is non-NULL then at least one member of the array is non-negative, and thus assumes that the buffer is non-contiguous. This is not always correct.

One consequence of this bug is PyBuffer_ToContiguous() (and thus memoryview.tobytes()) falls back to slow (and currently buggy, see issue #23349) memory copying code.

Attached is a patch that fixes this bug. The patch is against 2.7 but can be trivially modified to apply to 3.x.

msg235016 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2015-01-30 01:39

The patch has an obvious syntax error :-) Other than that, adding a comment would be nice. Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).

msg235017 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-01-30 01:42

The patch has an obvious syntax error :-)

Doh!

Other than that, adding a comment would be nice.

Agreed, will do.

Bonus points if you can write a test (3.x has infrastructure for that, see Lib/test/test_buffer.py).

I'll take a look. Thanks!

msg235020 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-01-30 05:45

I've attached a new version of the patch. Suggestions for simplifying the test code would be appreciated.

msg235032 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-01-30 13:10

PEP-3118 says:

Py_buffer.suboffsets:

"If all suboffsets are negative (i.e. no de-referencing is needed, then this must be NULL (the default value)."

I would be inclined to go with the PEP here and make a doc fix instead.

msg235034 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-01-30 14:29

For the record: I'm myself guilty of accepting all-negative suboffsets in memoryview.c (see init_slice()) and I think there's even a test for it.

However, while it's ok for a specific function to accept this corner case, I'm not sure about is_contiguous().

People might rely on the fact that contiguous implies suboffsets==NULL.

Sebastian, did this special case ever come up in the context of NumPy?

msg235052 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-01-30 20:10

People might rely on the fact that contiguous implies suboffsets==NULL.

Cython (currently) relies on all-negatives being acceptable and equivalent to suboffsets==NULL. See: https://github.com/cython/cython/pull/367 http://thread.gmane.org/gmane.comp.python.cython.devel/15569

msg235053 - (view)

Author: Sebastian Berg (seberg) *

Date: 2015-01-30 20:36

Numpy does not understand suboffsets. The buffers we create will always have them NULL. The other way around.... To be honest, think it is probably ignoring the whole fact that they might exist at all :/, really needs to be fixed if it is the case.

msg235107 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-01-31 13:03

Cython doesn't follow the spec though (use Python 3):

from _testbuffer import *

cpdef foo(): cdef unsigned char[:] v = bytearray(b"testing") nd = ndarray(v, getbuf=PyBUF_ND) print(nd.suboffsets) nd = ndarray(v, getbuf=PyBUF_FULL) print(nd.suboffsets)

Cython hands out suboffsets even for a PyBUF_ND request, which is definitely wrong. For a PyBUF_FULL request they should only be provided if needed.

msg235111 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-01-31 14:53

To summarize, I think this is different from #22445, which also requests relaxed contiguity tests. #22445 is motivated by the fact that slicing can create a certain class of contiguous buffers that aren't recognized as such.

No such reason exists here: All-negative suboffsets are only created if a buffer provider chooses to hand them out instead of NULL. To my knowledge only Cython does this, and it's against the PEP (though strictly speaking not against the docs).

So I'm -0.5 on this change in general and -1 for 2.7.

I'd reject the issue, but let's wait for Antoine's opinion.

msg235122 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-01-31 17:53

Prepare for a partial reversal of opinion. :)

Indexing can actually create all-negative suboffsets in a natural way:

from _testbuffer import * nd = ndarray([1,2,3,4,5,6,7,8,9,10,11,12], shape=[3,4], flags=ND_PIL) nd.tolist() [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

x = nd[1] x.tolist() [5, 6, 7, 8] x.suboffsets (-1,)

This leaves me +-0 for the change, with the caveat that applications might break.

msg235125 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2015-01-31 18:51

I don't have an opinion on this. I've never seen suboffsets in use; but it seems reasonable to detect a "dummy suboffsets" array and recognize it as contiguous.

msg235141 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-02-01 02:29

(The following message is mostly off-topic but I think it is relevant to those interested in this issue. This message is about the clarity of the documentation regarding flag semantics, and what I think the flags should mean.)

Cython doesn't follow the spec though (use Python 3):

from _testbuffer import *
cpdef foo():
    cdef unsigned char[:] v = bytearray(b"testing")
    nd = ndarray(v, getbuf=PyBUF_ND)
    print(nd.suboffsets)
    nd = ndarray(v, getbuf=PyBUF_FULL)
    print(nd.suboffsets)

When I compile and run the above (latest Cython from Git master), I get:

()
()

Looking at the Python source code (Modules/_testbuffer.c ndarray_get_suboffsets() and ssize_array_as_tuple()) the above output is printed if the suboffsets member is NULL.

Cython hands out suboffsets even for a PyBUF_ND request, which is definitely wrong. For a PyBUF_FULL request they should only be provided if needed.

suboffsets appears to be NULL in both cases in your example, which seems acceptable to me given:

Even if suboffsets wasn't NULL, I believe the behavior would still be correct for the PyBUF_ND case. My reading of <https://docs.python.org/3/c-api/buffer.html> is that flags=PyBUF_ND implies that shape member MUST NOT be NULL, but strides and suboffsets can be NULL or not (it doesn't matter). This interpretation of mine is due to the sentence, "Note that each flag contains all bits of the flags below it." If flags=PyBUF_ND meant that strides and suboffsets MUST be NULL, then PyBUF_ND and PyBUF_STRIDES would necessarily be mutually exclusive and flags=(PyBUF_ND|PyBUF_STRIDES) would not make sense (the strides member would have to be both NULL and non-NULL).

If (flags & PyBUF_INDIRECT) is false, then the consumer is not interested in the suboffsets member so it shouldn't matter what it points to. (And the consumer should not dereference the pointer in case it points to a junk address.)

IMHO, if the buffer is C-style contiguous then the producer should be allowed to populate the shape, strides, and suboffsets members regardless of whether or not any of the PyBUF_ND, PyBUF_STRIDES, or PyBUF_INDIRECT flags are set. In other words, for C-style contiguous buffers, the producer should be allowed to act as if PyBUF_INDIRECT was always provided because the consumer will always get an appropriate Py_buffer struct regardless of the actual state of the PyBUF_ND, PyBUF_STRIDES, and PyBUF_INDIRECT flags.

It is a bug, however, to dereference the strides or suboffsets members with ndarray(v, getbuf=PyBUF_ND) because the consumer didn't ask for strides or suboffsets information and the pointers might be bogus.

Stepping back a bit, I believe that the flags should be thought of as imposing requirements on the producer. I think those requirements should be (assuming ndim > 0):

msg235145 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-02-01 07:57

Attached is a documentation patch that adds what I said in . I doubt everyone will agree with the changes, but maybe it will be a useful starting point.

(Despite not having an asterisk next to my username, I have signed the contributor agreement. It just hasn't been processed yet.)

msg235146 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-02-01 08:45

This leaves me +-0 for the change, with the caveat that applications might break.

How might an application break with this change? Compared to the current PyBuffer_IsContiguous(), the patched version is the same except it returns true for a wider range of actually-contiguous buffers.

msg235148 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-02-01 08:59

Richard Hansen added the comment:

Cython doesn't follow the spec though (use Python 3):

from _testbuffer import *
cpdef foo():
    cdef unsigned char[:] v = bytearray(b"testing")
    nd = ndarray(v, getbuf=PyBUF_ND)
    print(nd.suboffsets)
    nd = ndarray(v, getbuf=PyBUF_FULL)
    print(nd.suboffsets)

When I compile and run the above (latest Cython from Git master), I get:

()
()

With Cython version 0.20.1post0 I get:

foo.foo() (-1,) (-1,)

If you get the correct output from the latest Cython, it looks like this issue has been fixed.

msg235150 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-02-01 09:10

How might an application break with this change?

func1: if (!PyBuffer_IsContiguous(view, 'C')) error();

func2(view);

func2: assert(view->suboffsets == NULL); ...

Yes, I did insert sanity checks like this in some places.

msg235151 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-02-01 09:21

When I compile and run the above (latest Cython from Git master), I get:

()
()

With Cython version 0.20.1post0 I get:

foo.foo() (-1,) (-1,)

If you get the correct output from the latest Cython, it looks like this issue has been fixed.

Oops, I was running a locally modified version of Cython that contained a patch meant to work around issue #23349.

When I run the actual upstream master I get the same behavior that you do.

But either way, I don't see why it's a problem that it prints (-1,) for the PyBUF_ND case. The consumer didn't request suboffsets information so I would expect it to be OK for the producer to put whatever it wants in the suboffsets field, including a junk pointer that would segfault upon dereference.

msg235152 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-02-01 09:23

How might an application break with this change?

assert(view->suboffsets == NULL);

Fair point.

msg235153 - (view)

Author: Sebastian Berg (seberg) *

Date: 2015-02-01 09:29

I do not have an opinion either way, but I do think there is a small difference here compared to the other issue. With the other issue, there are cases where we cannot set the strides correctly.

If you ask numpy directly whether the array is contiguous (or create it from numpy) and it says "yes", and then you get it without asking for a contiguous array it is, without the change, possible to get an array that would not be considered contiguous.

On the other hand, setting suboffsets to NULL when all are -1 is always possible, if tedious I admit.

msg235155 - (view)

Author: Richard Hansen (rhansen) *

Date: 2015-02-01 09:39

My preference is to apply the patch, of course. There is a legitimate concern that it will break existing code, but I think there are more points in favor of applying the patch:

msg235161 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-02-01 11:36

But only Cython does not set suboffsets to NULL and you already have a small patch to fix that.

The Python 3 docs say "suboffsets only if needed" and the PEP says the same, so the situation is not completely undocumented.

I think your doc patch goes too far. Buffers propagate in unpredictable ways, and since the original consumer request is not stored in the buffer, setting unneeded fields to NULL provides a way to figure out the original request.

It is actually an optimization to set suboffsets to NULL: Compliant code that uses arbitrary buffers always has to check for:

if (suboffsets != NULL && suboffsets[i] >= 0)
    ...

If suboffsets are consistently NULL, at least hopefully you get branch prediction to kick in.

As Sebastian pointed out, it's relatively easy even for slicing/indexing functions to check for all-negative suboffsets and handle that case.

All this is probably academic, since no one appears to be using suboffsets at all. :)

Let's keep the status-quo and make this a doc issue for 2.7.

msg235194 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2015-02-01 18:50

New changeset c4c1d68b6301 by Stefan Krah in branch '2.7': Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets https://hg.python.org/cpython/rev/c4c1d68b6301

New changeset de5c8ee002bf by Stefan Krah in branch '3.4': Issue #23352: Document that Py_buffer.suboffsets must be NULL if no suboffsets https://hg.python.org/cpython/rev/de5c8ee002bf

New changeset 5d097a74766f by Stefan Krah in branch 'default': Issue #23352: Merge from 3.4. https://hg.python.org/cpython/rev/5d097a74766f

History

Date

User

Action

Args

2022-04-11 14:58:12

admin

set

github: 67541

2015-02-02 17:05:41

skrah

set

status: open -> closed
resolution: fixed
stage: resolved

2015-02-01 18:50:57

python-dev

set

nosy: + python-dev
messages: +

2015-02-01 11:36:19

skrah

set

priority: normal -> low

type: behavior -> enhancement
assignee: docs@python
components: + Documentation, - Interpreter Core
title: PyBuffer_IsContiguous() sometimes returns wrong result if suboffsets != NULL -> Document "suboffsets if needed" in 2.7
nosy: + docs@python

messages: +
stage: patch review -> (no value)

2015-02-01 09:39:42

rhansen

set

messages: +

2015-02-01 09:29:13

seberg

set

messages: +

2015-02-01 09:23:25

rhansen

set

messages: +

2015-02-01 09:21:18

rhansen

set

messages: +

2015-02-01 09:10:04

skrah

set

messages: +

2015-02-01 08:59:39

skrah

set

messages: +

2015-02-01 08:45:35

rhansen

set

messages: +

2015-02-01 08:14:30

rhansen

set

files: + doc_c-api_buffer.patch

2015-02-01 08:14:14

rhansen

set

files: - doc_c-api_buffer.patch

2015-02-01 08:05:41

rhansen

set

files: + doc_c-api_buffer.patch

2015-02-01 08:05:16

rhansen

set

files: - doc_c-api_buffer.patch

2015-02-01 07:57:34

rhansen

set

files: + doc_c-api_buffer.patch

messages: +

2015-02-01 02:29:22

rhansen

set

messages: +

2015-01-31 18:51:52

pitrou

set

messages: +

2015-01-31 17:53:23

skrah

set

messages: +

2015-01-31 14:53:58

skrah

set

messages: +

2015-01-31 13:03:16

skrah

set

messages: +

2015-01-30 20:36:31

seberg

set

messages: +

2015-01-30 20:10:03

rhansen

set

messages: +

2015-01-30 14:29:25

skrah

set

nosy: + seberg
messages: +

2015-01-30 13:10:57

skrah

set

messages: +

2015-01-30 05:52:11

rhansen

set

versions: - Python 3.2, Python 3.3

2015-01-30 05:51:37

rhansen

set

versions: + Python 3.2, Python 3.3

2015-01-30 05:45:06

rhansen

set

files: + PyBuffer_IsContiguous_v2.patch

messages: +

2015-01-30 01:42:25

rhansen

set

messages: +

2015-01-30 01:39:50

pitrou

set

versions: - Python 3.2, Python 3.3, Python 3.6
nosy: + skrah, pitrou

messages: +

stage: patch review

2015-01-30 01:35:07

rhansen

set

type: behavior

2015-01-30 01:33:45

rhansen

create