[Python-Dev] Python 3.x and bytes (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Thu May 19 10:43:54 CEST 2011


OK, summarising the thread so far from my point of view.

  1. There are some aspects of the behavior of bytes() objects that tempt people to think of them as string-like objects (primarily the b'' literals and their use in repr(), along with the fact that they fill roles that were filled by str in it's "arbitrary binary data" incarnation in Python 2.x). The mental model this creates in the reader is incorrect, as bytes() are far closer to array.array('c') in their underlying behaviour (and deliberately so - cf. PEP 358, 3112, 3137).

One proposal for addressing this is to add a x'deadbeef' literal and using that in repr() rather than the bytestring. Another would be to escape all characters, even printable ASCII, in the bytes() representation. Both of these are undesirable, as they miss the original purpose of this behaviour: making it easier to work with the many ASCII based wire protocols that are in widespread use.

To be honest, I don't think there is a lot we can do here except to further emphasise in the documentation and elsewhere that bytes is not a string type (regardless of any API similarities retained to ease transition from the 2.x series). For example, if we have any lingering references to "byte strings" they should be replaced with "byte sequences" or "bytes objects" (depending on context, as the former phrasing also encompasses bytearray objects).

  1. As a concrete usability issue, it is awkward to programmatically check the value of a specific byte when working with an ASCII based protocol:

data[i] == b'a' # Intuitive, but always False due to type mismatch data[i:i+1] == b'a' # Works, but clumsy data[i] == b'a'[0] # Ditto (but at least susceptible to compiler const-expression optimisation) data[i] == ord('a') # Clumsy and slow data[i] == 97 # Hard to read

Proposals to address this include:

For point 2, I'm personally +0 on the idea of having 1-element bytes and bytearray objects delegate hashing and comparison operations to the corresponding integer object. We have the power to make the obvious code correct code, so let's do that. However, the implications of the additional key collisions in value based containers may need to be explored further.

Cheers, Nick.

-- Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia



More information about the Python-Dev mailing list