[Python-Dev] Python 3.x and bytes (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Thu May 19 10:43:54 CEST 2011

Previous message: [Python-Dev] Python 3.x and bytes
Next message: [Python-Dev] Python 3.x and bytes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

OK, summarising the thread so far from my point of view.

There are some aspects of the behavior of bytes() objects that tempt people to think of them as string-like objects (primarily the b'' literals and their use in repr(), along with the fact that they fill roles that were filled by str in it's "arbitrary binary data" incarnation in Python 2.x). The mental model this creates in the reader is incorrect, as bytes() are far closer to array.array('c') in their underlying behaviour (and deliberately so - cf. PEP 358, 3112, 3137).

One proposal for addressing this is to add a x'deadbeef' literal and using that in repr() rather than the bytestring. Another would be to escape all characters, even printable ASCII, in the bytes() representation. Both of these are undesirable, as they miss the original purpose of this behaviour: making it easier to work with the many ASCII based wire protocols that are in widespread use.

To be honest, I don't think there is a lot we can do here except to further emphasise in the documentation and elsewhere that bytes is not a string type (regardless of any API similarities retained to ease transition from the 2.x series). For example, if we have any lingering references to "byte strings" they should be replaced with "byte sequences" or "bytes objects" (depending on context, as the former phrasing also encompasses bytearray objects).

As a concrete usability issue, it is awkward to programmatically check the value of a specific byte when working with an ASCII based protocol:

data[i] == b'a' # Intuitive, but always False due to type mismatch data[i:i+1] == b'a' # Works, but clumsy data[i] == b'a'[0] # Ditto (but at least susceptible to compiler const-expression optimisation) data[i] == ord('a') # Clumsy and slow data[i] == 97 # Hard to read

Proposals to address this include:

introduce a "character" literal to allow c'a' as an alternative to ord('a') Potentially workable, but leaves the intuitive answer above silently producing an unexpected answer
allow 1-element byte sequences to compare equal to the corresponding integer values.
- would require reworking of bytes.hash to use the hash of the contained element when the data length is exactly 1
- transitivity of equality would recommend also supporting equivalences such as b'a' == 97.0
- backwards compatibility concerns arise due to introduction of new key collisions in dictionaries and sets and other value based containers
- yet more string-like behaviour in a type that is not a string (further reinforcing the mistaken impression from point 1)
- One thing that isn't a concern from my point of view is the fact that we have ample precedent in decimal.Decimal for supporting implicit coercion in comparison operations while disallowing them in arithmetic operations (Decimal("1") == 1.0 is allowed, but Decimal("1") + 1.0 will raise TypeError).

For point 2, I'm personally +0 on the idea of having 1-element bytes and bytearray objects delegate hashing and comparison operations to the corresponding integer object. We have the power to make the obvious code correct code, so let's do that. However, the implications of the additional key collisions in value based containers may need to be explored further.

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message: [Python-Dev] Python 3.x and bytes
Next message: [Python-Dev] Python 3.x and bytes
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list