[Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?] (original) (raw)

M.-A. Lemburg mal at egenix.com
Wed Feb 15 22:07:02 CET 2006


Jason Orendorff wrote:

Instead of byte literals, how about a classmethod bytes.fromhex(), which works like this:

# two equivalent things expectedmd5hash = bytes.fromhex('5c535024cac5199153e3834fe5c92e6a') expectedmd5hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227, 131, 79, 229, 201, 46, 106]) It's just a nicety; the former fits my brain a little better. This would work fine both in 2.5 and in 3.0. I thought about unicode.encode('hex'), but obviously it will continue to return a str in 2.x, not bytes. Also the pseudo-encodings ('hex', 'rot13', 'zip', 'uu', etc.) generally scare me.

Those are not pseudo-encodings, they are regular codecs.

It's a common misunderstanding that codecs are only seen as serving the purpose of converting between Unicode and strings.

The codec system is deliberately designed to be general enough to also work with many other types, e.g. it is easily possible to write a codec that convert between the hex literal sequence you have above to a list of ordinals:

""" Hex string codec

Converts between a list of ordinals and a two byte hex literal
string.

Usage:
>>> codecs.encode([1,2,3], 'hexstring')
'010203'
>>> codecs.decode(_, 'hexstring')
[1, 2, 3]

(c) 2006, Marc-Andre Lemburg.

""" import codecs

class Codec(codecs.Codec):

def encode(self, input, errors='strict'):

    """ Convert hex ordinal list to hex literal string.
    """
    if not isinstance(input, list):
        raise TypeError('expected list of integers')
    return (
        ''.join(['%02x' % x for x in input]),
        len(input))

def decode(self,input,errors='strict'):

    """ Convert hex literal string to hex ordinal list.
    """
    if not isinstance(input, str):
        raise TypeError('expected string of hex literals')
    size = len(input)
    if not size % 2 == 0:
        raise TypeError('input string has uneven length')
    return (
        [int(input[(i<<1):(i<<1)+2], 16)
         for i in range(size >> 1)],
        size)

class StreamWriter(Codec,codecs.StreamWriter): pass

class StreamReader(Codec,codecs.StreamReader): pass

def getregentry(): return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

And now that bytes and text are going to be two very different types, they're even weirder than before. Consider:

text.encode('utf-8') ==> bytes text.encode('rot13') ==> text bytes.encode('zip') ==> bytes bytes.encode('uu') ==> text (?) This state of affairs seems kind of crazy to me.

Really ?

It all depends on what you use the codecs for. The above usages through the .encode() and .decode() methods is not the only way you can make use of them.

To get full access to the codecs, you'll have to use the codecs module.

Actually users trying to figure out Unicode would probably be better served if bytes.encode() and text.decode() did not exist.

You're missing the point: the .encode() and .decode() methods are merely interfaces to the registered codecs. Whether they make sense for a certain codec depends on the codec, not the methods that interface to it, and again, codecs do not only exist to convert between Unicode and strings.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Feb 15 2006)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::



More information about the Python-Dev mailing list