[Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer (original) (raw)

Joel Bender jjb5 at cornell.edu
Thu Sep 27 19:14:53 CEST 2007

Previous message: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
Next message: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Making an iterator over an integer sequence acceptable in the constructor strongly suggests that a byte sequence contains integers between 0 and 255 inclusive, not length 1 byte sequences.

And I think that's the cleanest conceptual model for them as well. A byte sequence doesn't contain length 1 byte sequences, it contains bytes (i.e. numbers between 0 and 255 inclusive).

Using standards language, an octet string contains octets. Since Python blurs the distinction between characters and strings of length 1, shouldn't it also blur the distinction between octets and an octet strings of length 1?

The only problematic case is cases such as iterating over a byte sequence where we may have an integer and want to compare it to a length 1 byte string.

Why is it problematic? Why does a programmer have to jump through hoops to compare the two?

  >>> x, y = "abc", "a"
  >>> x[0] == y
  True

And the same should be true for octet strings:

  >>> x, y = b"abc", b"a"
  >>> x[0] == y
  True

With just the simple conceptual model...

Python doesn't have a simple conceptual model, there is no distinction between strings of length 1 and characters. This makes it pretty clear that octet strings contain octets:

 >>> list(b"1234")
 [49, 50, 51, 52, 53]

And you should be able check for an octet in an octet string:

 >>> 51 in b"1234"
 True

And if I want to specify the same octet in ASCII do this:

 >>> b'3' in b"1234"
 True

I don't think it's worth breaking the conceptual model of the data type just to reduce the simplest spelling of that comparison by 3 characters.

The programmer shouldn't have to go through any one of those gyrations, the only reason why saying chr(51) == '3' is necessary is because characters and integers are different types. But octets and "integers in the range(256)" are exactly the same thing.

 >>> b'3' == 51
 True

The fact that octets can be written as an octet string of length 1 is just a happy coincidence of Python, just like characters.

for val in data.fragments(): if val == b'x': print "Found an x!"

That's a hideous amount of work to just say:

 if b'x' in data:
     print "Found an x!"

Joel

Previous message: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
Next message: [Python-3000] PEP 3137: Immutable Bytes and Mutable Buffer
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list