[Python-Dev] bytes.from_hex() (original) (raw)

Just van Rossum just at letterror.com
Thu Mar 2 09:57:57 CET 2006


Ron Adam wrote:

Josiah Carlson wrote: > Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: >> u = unicode(b) >> u = unicode(b, 'utf8') >> b = bytes'utf8' >> u = unicode'base64' # encoding >> b = bytes(u, 'base64') # decoding >> u2 = unicode'piglatin' # encoding >> u1 = unicode(u2, 'piglatin') # decoding > > Your provided semantics feel cumbersome and confusing to me, as > compared with str/unicode.encode/decode() . > > - Josiah

This uses syntax to determine the direction of encoding. It would be easier and clearer to just require two arguments or a tuple. u = unicode(b, 'encode', 'base64') b = bytes(u, 'decode', 'base64') b = bytes(u, 'encode', 'utf-8') u = unicode(b, 'decode', 'utf-8') u2 = unicode(u1, 'encode', 'piglatin') u1 = unicode(u2, 'decode', 'piglatin')

It looks somewhat cleaner if you combine them in a path style string. b = bytes(u, 'encode/utf-8') u = unicode(b, 'decode/utf-8')

It gets from bad to worse :(

I always liked the assymmetry between

u = unicode(s, "utf8")

and

s = u.encode("utf8")

which I think was the original design of the unicode API. Cudos for whoever came up with that.

When I saw

b = bytes(u, "utf8")

mentioned for the first time, I thought: why on earth must the bytes constructor be coupled to the unicode API?!?! It makes no sense to me whatsoever. Bytes have so much more use besides encoded text.

I believe (please correct me if I'm wrong) that the encoding argument of bytes() was invented to make it easier to write byte literals. Perhaps a true bytes literal notation is in order after all?

My preference for bytes -> unicode -> bytes API would be this:

u = unicode(b, "utf8")  # just like we have now
b = u.tobytes("utf8")   # like u.encode(), but being explicit
                        # about the resulting type

As to base64, while it works as a codec ("Why a base64 codec? Because we can!"), I don't find it a natural API at all, for such conversions.

(I do however agree with Greg Ewing that base64 encoded data is text, not ascii-encoded bytes ;-)

Just-my-2-cts



More information about the Python-Dev mailing list