[Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Fri Jun 10 13:40:53 EDT 2016


On 9 June 2016 at 19:21, Barry Warsaw <barry at python.org> wrote:

On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:

Deprecation of current "zero-initialised sequence" behaviour ------------------------------------------------------------

Currently, the bytes and bytearray constructors accept an integer argument and interpret it as meaning to create a zero-initialised sequence of the given size:: >>> bytes(3) b'\x00\x00\x00' >>> bytearray(3) bytearray(b'\x00\x00\x00') This PEP proposes to deprecate that behaviour in Python 3.6, and remove it entirely in Python 3.7. No other changes are proposed to the existing constructors. Does it need to be actually removed? That does break existing code for not a lot of benefit. Yes, the default constructor is a little wonky, but with the addition of the new constructors, and the fact that you're not proposing to eventually change the default constructor, removal seems unnecessary. Besides, once it's removed, what would bytes(3) actually do? The PEP doesn't say.

Raise TypeError, presumably. However, I agree this isn't worth the hassle of breaking working code, especially since truly ludicrous values will fail promptly with MemoryError - it's only a particular range of values that fit within the limits of the machine, but also push it into heavy swapping that are a potential problem.

Also, since you're proposing to add bytes.byte(3) have you considered also adding an optional count argument? E.g. bytes.byte(3, count=7) would yield b'\x03\x03\x03\x03\x03\x03\x03'. That seems like it could be useful.

The purpose of bytes.byte() in the PEP is to provide a way to roundtrip ord() calls with binary inputs, since the current spelling is pretty unintuitive:

>>> ord("A")
65
>>> chr(ord("A"))
'A'
>>> ord(b"A")
65
>>> bytes([ord(b"A")])
b'A'

That said, perhaps it would make more sense for the corresponding round-trip to be:

>>> bchr(ord("A"))
b'A'

With the "b" prefix on "chr" reflecting the "b" prefix on the output. This also inverts the chr/unichr pairing that existed in Python 2 (replacing it with bchr/chr), and is hence very friendly to compatibility modules like six and future (future.builtins already provides a chr that behaves like the Python 3 one, and bchr would be much easier to add to that than a new bytes object method).

In terms of an efficient memory-preallocation interface, the equivalent NumPy operation to request a pre-filled array is "ndarray.full": http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.full.html (there's also an inplace mutation operation, "fill")

For bytes and bytearray though, that has an unfortunate name collision with "zfill", which refers to zero-padding numeric values for fixed width display.

If the PEP just added bchr() to complement chr(), and [bytes, bytearray].zeros() as a more discoverable alternative to passing integers to the default constructor, I think that would be a decent step forward, and the question of pre-initialising with arbitrary values can be deferred for now (and perhaps left to NumPy indefinitely)

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia



More information about the Python-Dev mailing list