Issue 23756: Tighten definition of bytes-like objects (original) (raw)

Created on 2015-03-24 08:25 by martin.panter, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (23)

msg239097 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-03-24 08:25

There are moves at documenting and implementing support for “bytes-like” objects in more APIs, such as the “io” module (Issue 20699), http.client (Issue 23740). The glossary definition is currently “An object that supports the Buffer Protocol, like bytes, bytearray or memoryview.” This was originally added for Issue 16518. However after reading Issue 23688, I realized that it should probably not mean absolutely any object supporting the buffer protocol. For instance:

reverse_view = memoryview(b"123")[::-1] stdout.buffer.write(reverse_view) Traceback (most recent call last): File "", line 1, in BufferError: memoryview: underlying buffer is not C-contiguous

I think the definition should at least be tightened to only objects with a contiguous buffer, and “contiguous” should be defined (probably in the linked C API page or the memoryview.contiguous flag definition, not the glossary). So far, my understanding is these are contiguous:

and these are not contiguous:

Perhaps the bytes-like definition should tightened further, to match the above error message, to only “C-contiguous” buffers. I understand that C-contiguous means the strides tuple has to be in non-strict decreasing order, e.g. for 2 × 1 × 3 arrays, strides == (3, 3, 1) is C-contiguous, but strides == (1, 3, 3) is not. This also needs documenting.

I’m not so sure about these, but the definition could be tightened even further:

msg239101 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2015-03-24 09:20

Totally agree. Current definition is too wide. Actually in different places the term "bytes-like object" can imply different requirements.

msg239136 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2015-03-24 15:00

This not dissimilar to the problem we have with "file like object" or "file object". The requirements on them are not consistent. I'm not sure what the solution is in either case, but for file like objects we have decided to ignore the issue, I think. (ie: what exactly a file like object needs to support in order to work with a give API depends on that API, and we just live with that).

msg239783 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-04-01 11:33

After doing a bit of reading and experimenting, I think we should at least restrict bytes-like objects to “C-contiguous”. Any looser definition risks memoryview(byteslike).tobytes() returning the bytes in a different order than their true in-memory order. Fortran-style contiguous arrays aren’t enough:

import _testbuffer, sys fortran = memoryview(_testbuffer.ndarray([11, 12, 21, 22], format="B", flags=0, shape=[2, 2], strides=[1, 2], offset=0)) fortran.f_contiguous True fortran.c_contiguous False fortran.tolist() [[11, 21], [12, 22]] tuple(bytes(fortran)) (11, 21, 12, 22) sys.stdout.buffer.write(fortran) Traceback (most recent call last): File "", line 1, in BufferError: memoryview: underlying buffer is not C-contiguous

So I am proposing a patch which:

msg239784 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2015-04-01 11:44

+1 for the idea overall and the patch LGTM.

msg239795 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-04-01 12:35

I have a somewhat general concern: In the last year or so, issues seem to expand far beyond the scope that's indicated by the issue title.

For example, I don't particularly care about the definition of "bytes-like", but the patch contains changes to areas I do care about.

I don't think all of the changes are an improvement: What is the "flattened length", why does C-contiguous have to be explained?

msg239815 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2015-04-01 14:01

I found the explanation of C-contiguous vs Fortran-contiguous helpful (and I've programmed in both of those languages, though granted not much :). However, based on that it is not obvious to me why having a fortran-contiguous buffer prevents it from being used in the bytes-like object contexts (though granted the order might be surprising to someone who is not thinking about the memory ordering and just assuming C).

I don't have much of an opinion on the other non-glossary-entry changes, but at a quick read I'm not sure how much clarity they add, if any.

msg239816 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2015-04-01 14:05

Oh, and about the general concern: I agree that this issue was apparently about the glossary entry, so making other changes is suspect and at a minimum requires adding relevant people from the experts list to nosy to get proper review of other proposed changes.

msg239817 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2015-04-01 14:24

What people are needed? The patch looks as great improvement to me.

msg239818 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2015-04-01 14:43

Stefan, since he's the current maintainer of the memoryview implementation. Fortunately he spotted the issue :) Maybe Antoine, too; he's done work in this area. I'll add him.

msg239819 - (view)

Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer)

Date: 2015-04-01 15:06

See also .

msg239851 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-04-01 21:37

I will to pull the stdtypes.rst changes out into a separate patch and issue, if that will make review easier. I think they are an improvement because the previous version was incorrect and misleading, but they are probably not necessary for people to understand what a C-contiguous bytes-like object is.

I don’t think “flattened length” is explicitly defined anywhere, but it is already used in the memoryview() documentation and elsewhere. I took it to mean the number of elements in the nested list, if you ignore the fact that it is nested; i.e. ignore the extra "[" and "]" delimiters in the repr(), and just count the other values inside them.

The reason for defining C-contiguous was that I originally understood “contiguous” to be much more general than what seems to be meant. I assumed both memoryview(contiguous)[::-1] and a 2×2×2 array with stride=[4, 1, 2] would be contiguous, although neither have the C or Fortran array layout.

I think we need to define C-contiguous well enough for people to understand which standard Python objects are bytes-like objects. Maybe not Fortran-contiguous, because it doesn’t seem relevant to standard Python objects. Considering Serhiy asked if array.array() is always C-contiguous, I may not have done enough to explain that. (I think the answer is always yes.)

David: If a Fortran array was allowed in a bytes-like context without memory copying, the order of the array elements would differ from the order returned by the meoryview.tobytes() method, which essentially is defined to copy them out in C-array or flattend-tolist() order.

msg239901 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-04-02 11:10

If you think that the previous version was "incorrect and misleading", the bar for changes to be accepted seems pretty high for me.

"grep -r" doesn't seem to find "flattened length" in Doc/*.

"An object that supports the :ref:bufferobject and is C-contiguous, like :class:bytes, :class:bytearray or :class:memoryview."

This implies that all memoryviews are C-contiguous.

msg239919 - (view)

Author: R. David Murray (r.david.murray) * (Python committer)

Date: 2015-04-02 14:57

If a Fortran array was allowed in a bytes-like context without memory copying, the order of the array elements would differ from the order returned by the meoryview.tobytes() method, which essentially is defined to copy them out in C-array or flattend-tolist() order.

I'm still not seeing how this would cause such an object to throw an error if used in a bytes-like context. I presume by the above that you mean that the results of passing the object directly to a bytes like context differs from the results of calling .tobytes() on it and passing that to the bytes like context. That's not what your suggested documentation change says, though.

msg239971 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-04-03 11:37

I’m sorry Stefan, I now realize my changes for len(view) were indeed wrong, and the original was much more correct. I still think the tobytes() and maybe tolist() documentation could be improved, but that is a separate issue to the bytes-like definition.

I am posting c-contig.v2.patch. Hopefully you will find it is truer to my original scope :)

David: The result of passing a Fortran array directly in a bytes-like context is typically BufferError. If this were relaxed, then we would get the inconsistency with tobytes().

import _testbuffer, sys layout = [11, 21, 12, 22] fortran_array = _testbuffer.ndarray(layout, format="B", flags=0, shape=[2, 2], strides=[1, 2], offset=0) sys.stdout.buffer.write(fortran_array) Traceback (most recent call last): File "", line 1, in BufferError: ndarray is not C-contiguous list(memoryview(fortran_array).tobytes()) # C-contiguous order! [11, 12, 21, 22]

msg247464 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-07-27 12:36

Sorry, I'm still not convinced that the C-contiguity explanation is in the right place. The docs have to be terse in order to be useful as a reference, and the explanation at that particular location breaks the flow of reading. So, please don't commit that.

The glossary update looks good to me. -- C-contiguity could also be explained in the glossary, but I prefer this explanation from

http://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html

"Specify the order of the array. If order is ‘C’ (default), then the array will be in C-contiguous order (last-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (first-index varies the fastest)."

msg247558 - (view)

Author: Martin Panter (martin.panter) * (Python committer)

Date: 2015-07-29 04:34

Patch v3:

I found the Num Py explanation a bit brief, assuming what you quoted was the extent of it. I used some of that wording, and added a bit more, although it is still not a complete definition. Let me know if you think this version is acceptable.

msg247603 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-07-29 18:49

c-contig.v3.patch LGTM.

msg248263 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-08-08 12:09

I would commit this, except that I'm not too happy with the use of the term "bytes-like" in general. Yesterday I mistyped this:

import ctypes x = ctypes.c_double m = memoryview(x) Traceback (most recent call last): File "", line 1, in TypeError: memoryview: a bytes-like object is required, not '_ctypes.PyCSimpleType'

The previous error message was (changed in #16518) was:

"_ctypes.PyCSimpleType does not support the buffer interface".

Which I find much clearer. Memoryviews (for better or worse, but PEP-3118 was accepted) are Mini-NumPy-arrays. I'm still not sure if we should hide that from users.

msg248264 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2015-08-08 12:10

The thing is, most users don't know what the buffer protocol is (even Numpy users, probably), while "bytes-like" at least will make some sense - even though I agree it's an imperfect approximation.

msg248265 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-08-08 12:29

For end users it's probably adequate. But several committers find the term confusing (, ). :)

Anyway, I'm going to commit this since it adds clarity.

msg248266 - (view)

Author: Roundup Robot (python-dev) (Python triager)

Date: 2015-08-08 12:34

New changeset d1ef54751412 by Stefan Krah in branch '3.5': Issue #23756: Clarify the terms "contiguous" and "bytes-like object". https://hg.python.org/cpython/rev/d1ef54751412

New changeset c59b2c4f4cac by Stefan Krah in branch 'default': Merge #23756. https://hg.python.org/cpython/rev/c59b2c4f4cac

msg248267 - (view)

Author: Stefan Krah (skrah) * (Python committer)

Date: 2015-08-08 12:37

Martin, thanks for the patch!

History

Date

User

Action

Args

2022-04-11 14:58:14

admin

set

github: 67944

2015-08-08 12:37:35

skrah

set

status: open -> closed
versions: + Python 3.6, - Python 3.4
messages: +

resolution: fixed
stage: patch review -> resolved

2015-08-08 12:34:51

python-dev

set

nosy: + python-dev
messages: +

2015-08-08 12:29:05

skrah

set

messages: +

2015-08-08 12:10:47

pitrou

set

messages: +

2015-08-08 12:09:21

skrah

set

messages: +

2015-07-29 18:49:40

skrah

set

messages: +

2015-07-29 04:34:28

martin.panter

set

files: + c-contig.v3.patch

messages: +

2015-07-27 12:36:46

skrah

set

assignee: skrah ->
messages: +

2015-05-26 23:20:49

martin.panter

set

stage: needs patch -> patch review

2015-04-03 11:37:55

martin.panter

set

files: + c-contig.v2.patch

messages: +

2015-04-02 14:57:07

r.david.murray

set

messages: +

2015-04-02 11:10:07

skrah

set

messages: +

2015-04-02 10:42:47

skrah

set

assignee: docs@python -> skrah

2015-04-01 21:37:20

martin.panter

set

messages: +

2015-04-01 15:06:10

serhiy.storchaka

set

messages: +

2015-04-01 14:43:23

r.david.murray

set

nosy: + pitrou
messages: +

2015-04-01 14:24:00

serhiy.storchaka

set

messages: +

2015-04-01 14:05:23

r.david.murray

set

messages: +

2015-04-01 14:01:45

r.david.murray

set

messages: +

2015-04-01 12:35:06

skrah

set

nosy: + skrah
messages: +

2015-04-01 11:44:28

serhiy.storchaka

set

messages: +

2015-04-01 11:33:05

martin.panter

set

files: + c-contig.patch
keywords: + patch
messages: +

2015-03-24 15:00:54

r.david.murray

set

messages: +

2015-03-24 09:20:16

serhiy.storchaka

set

nosy: + serhiy.storchaka, r.david.murray

messages: +
versions: + Python 3.4, Python 3.5

2015-03-24 08:53:56

ezio.melotti

set

nosy: + ezio.melotti

type: enhancement
stage: needs patch

2015-03-24 08:25:17

martin.panter

create