[Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?] (original) (raw)

Guido van Rossum guido at python.org
Wed Feb 15 20:16:51 CET 2006


On 2/15/06, Jason Orendorff <jason.orendorff at gmail.com> wrote:

Instead of byte literals, how about a classmethod bytes.fromhex(), which works like this:

# two equivalent things expectedmd5hash = bytes.fromhex('5c535024cac5199153e3834fe5c92e6a') expectedmd5hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227, 131, 79, 229, 201, 46, 106]) It's just a nicety; the former fits my brain a little better. This would work fine both in 2.5 and in 3.0.

Yes, this looks nice.

I thought about unicode.encode('hex'), but obviously it will continue to return a str in 2.x, not bytes. Also the pseudo-encodings ('hex', 'rot13', 'zip', 'uu', etc.) generally scare me. And now that bytes and text are going to be two very different types, they're even weirder than before. Consider:

text.encode('utf-8') ==> bytes text.encode('rot13') ==> text bytes.encode('zip') ==> bytes bytes.encode('uu') ==> text (?) This state of affairs seems kind of crazy to me. Actually users trying to figure out Unicode would probably be better served if bytes.encode() and text.decode() did not exist.

Yeah, the pseudogeneralizations seem to be a mistake -- they are almost universally frowned upon. I'll happily send their to their grave in Py3k.

It would be better if the signature of text.encode() always returned a bytes object. But why deny the bytes object a decode() method if text objects have an encode() method?

I'd say there are two "symmetric" API flavors possible (t and b are text and bytes objects, respectively, where text is a string type, either str or unicode; enc is an encoding name):

I'm not sure why one flavor would be preferred over the other, although having both would probably be a mistake.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list