[Python-Dev] PEP 467: Minor API improvements for bytes & bytearray (original) (raw)
Nick Coghlan ncoghlan at gmail.com
Sat Aug 16 07:17:35 CEST 2014
- Previous message: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
- Next message: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 16 August 2014 03:48, Guido van Rossum <guido at python.org> wrote:
This feels chatty. I'd like the PEP to call out the specific proposals and put the more verbose motivation later.
I realised that some of that history was actually completely irrelevant now, so I culled a fair bit of it entirely.
It took me a long time to realize that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3).
I've split out the four subproposals into their own sections, so hopefully this is clearer now.
Also your mention of bytes.byte() as the counterpart to ord() confused me -- I think it's more similar to chr().
This was just a case of me using the wrong word - I meant "inverse" rather than "counterpart".
I don't like iterbytes as a builtin, let's keep it as a method on affected types.
Done. I also added an explanation of the benefits it offers over the more generic "map(bytes.byte, data)", as well as more precise semantics for how it will work with memoryview objects.
New draft is live at http://www.python.org/dev/peps/pep-0467/, as well as being included inline below.
Regards, Nick.
===================================
PEP: 467 Title: Minor API improvements for bytes and bytearray Version: RevisionRevisionRevision Last-Modified: DateDateDate Author: Nick Coghlan <ncoghlan at gmail.com> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2014-03-30 Python-Version: 3.5 Post-History: 2014-03-30 2014-08-15 2014-08-16
Abstract
During the initial development of the Python 3 language specification, the
core bytes
type for arbitrary binary data started as the mutable type
that is now referred to as bytearray
. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.
This PEP proposes four small adjustments to the APIs of the bytes
,
bytearray
and memoryview
types to make it easier to operate entirely
in the binary domain:
- Deprecate passing single integer values to
bytes
andbytearray
- Add
bytes.zeros
andbytearray.zeros
alternative constructors - Add
bytes.byte
andbytearray.byte
alternative constructors - Add
bytes.iterbytes
,bytearray.iterbytes
andmemoryview.iterbytes
alternative iterators
Proposals
Deprecation of current "zero-initialised sequence" behaviour
Currently, the bytes
and bytearray
constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::
>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')
This PEP proposes to deprecate that behaviour in Python 3.5, and remove it entirely in Python 3.6.
No other changes are proposed to the existing constructors.
Addition of explicit "zero-initialised sequence" constructors
To replace the deprecated behaviour, this PEP proposes the addition of an
explicit zeros
alternative constructor as a class method on both
bytes
and bytearray
::
>>> bytes.zeros(3)
b'\x00\x00\x00'
>>> bytearray.zeros(3)
bytearray(b'\x00\x00\x00')
It will behave just as the current constructors behave when passed a single integer.
The specific choice of zeros
as the alternative constructor name is taken
from the corresponding initialisation function in NumPy (although, as these
are 1-dimensional sequence types rather than N-dimensional matrices, the
constructors take a length as input rather than a shape tuple)
Addition of explicit "single byte" constructors
As binary counterparts to the text chr
function, this PEP proposes the
addition of an explicit byte
alternative constructor as a class method
on both bytes
and bytearray
::
>>> bytes.byte(3)
b'\x03'
>>> bytearray.byte(3)
bytearray(b'\x03')
These methods will only accept integers in the range 0 to 255 (inclusive)::
>>> bytes.byte(512)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: bytes must be in range(0, 256)
>>> bytes.byte(1.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'float' object cannot be interpreted as an integer
The documentation of the ord
builtin will be updated to explicitly note
that bytes.byte
is the inverse operation for binary data, while chr
is the inverse operation for text data.
Behaviourally, bytes.byte(x)
will be equivalent to the current
bytes([x])
(and similarly for bytearray
). The new spelling is
expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types).
As a separate method, the new spelling will also work better with higher
order functions like map
.
Addition of optimised iterator methods that produce bytes
objects
This PEP proposes that bytes
, bytearray
and memoryview
gain an
optimised iterbytes
method that produces length 1 bytes
objects
rather than integers::
for x in data.iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer
The method can be used with arbitrary buffer exporting objects by wrapping
them in a memoryview
instance first::
for x in memoryview(data).iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer
For memoryview
, the semantics of iterbytes()
are defined such that::
memview.tobytes() == b''.join(memview.iterbytes())
This allows the raw bytes of the memory view to be iterated over without needing to make a copy, regardless of the defined shape and format.
The main advantage this method offers over the map(bytes.byte, data)
approach is that it is guaranteed not to fail midstream with a
ValueError
or TypeError
. By contrast, when using the map
based
approach, the type and value of the individual items in the iterable are
only checked as they are retrieved and passed through the bytes.byte
constructor.
Design discussion
Why not rely on sequence repetition to create zero-initialised sequences?
Zero-initialised sequences can be created via sequence repetition::
>>> b'\x00' * 3
b'\x00\x00\x00'
>>> bytearray(b'\x00') * 3
bytearray(b'\x00\x00\x00')
However, this was also the case when the bytearray
type was originally
designed, and the decision was made to add explicit support for it in the
type constructor. The immutable bytes
type then inherited that feature
when it was introduced in PEP 3137.
This PEP isn't revisiting that original design decision, just changing the
spelling as users sometimes find the current behaviour of the binary sequence
constructors surprising. In particular, there's a reasonable case to be made
that bytes(x)
(where x
is an integer) should behave like the
bytes.byte(x)
proposal in this PEP. Providing both behaviours as separate
class methods avoids that ambiguity.
References
.. [1] Initial March 2014 discussion thread on python-ideas (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html) .. [2] Guido's initial feedback in that thread (https://mail.python.org/pipermail/python-ideas/2014-March/027376.html) .. [3] Issue proposing moving zero-initialised sequences to a dedicated API (http://bugs.python.org/issue20895) .. [4] Issue proposing to use calloc() for zero-initialised binary sequences (http://bugs.python.org/issue21644) .. [5] August 2014 discussion thread on python-dev (https://mail.python.org/pipermail/python-ideas/2014-March/027295.html)
-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
- Previous message: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
- Next message: [Python-Dev] PEP 467: Minor API improvements for bytes & bytearray
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]