[Python-Dev] bytes.from_hex() (original) (raw)

Ron Adam rrr at ronadam.com
Sat Feb 18 09:35:24 CET 2006


Josiah Carlson wrote:

Bob Ippolito <bob at redivi.com> wrote:

On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:

Greg Ewing <greg.ewing at canterbury.ac.nz> wrote: Stephen J. Turnbull wrote:

"Guido" == Guido van Rossum <guido at python.org> writes: Guido> - b = bytes(t, enc); t = text(b, enc) +1 The coding conversion operation has always felt like a constructor to me, and in this particular usage that's exactly what it is. I prefer the nomenclature to reflect that. This also has the advantage that it competely avoids using the verbs "encode" and "decode" and the attendant confusion about which direction they go in. e.g. s = text(b, "base64") makes it obvious that you're going from the binary side to the text side of the base64 conversion. But you aren't always getting unicode text from the decoding of bytes, and you may be encoding bytes to bytes: b2 = bytes(b, "base64") b3 = bytes(b2, "base64") Which direction are we going again? This is exactly why the current set of codecs are INSANE. unicode.encode and str.decode should be used only for unicode codecs. Byte transforms are entirely different semantically and should be some other method pair. The problem is that we are overloading data types. Strings (and bytes) can contain both encoded text as well as data, or even encoded data.

Right

Educate the users. Raise better exceptions telling people why their encoding or decoding failed, as Ian Bicking already pointed out. If bytes.encode() and the equivalent of text.decode() is going to disappear,

+1 on better documentation all around with regards to encodings and Unicode. So far the best explanation I've found (so far) is in PEP 100. The Python docs and built in help hardly explain more than the minimal argument list for the encoding and decoding methods, and the str and unicode type constructor arguments aren't explained any better.

Bengt Richter had a good idea with bytes.recode() for strictly bytes transformations (and the equivalent for text), though it is ambiguous as to the direction; are we encoding or decoding with bytes.recode()? In my opinion, this is why .encode() and .decode() makes sense to keep on both bytes and text, the direction is unambiguous, and if one has even a remote idea of what the heck the codec is, they know their result.

- Josiah

I like the bytes.recode() idea a lot. +1

It seems to me it's a far more useful idea than encoding and decoding by overloading and could do both and more. It has a lot of potential to be an intermediate step for encoding as well as being used for many other translations to byte data.

I think I would prefer that encode and decode be just functions with well defined names and arguments instead of being methods or arguments to string and Unicode types.

I'm not sure on exactly how this would work. Maybe it would need two sets of encodings, ie.. decoders, and encoders. An exception would be given if it wasn't found for the direction one was going in.

Roughly... something or other like:

 import encodings

 encodings.tostr(obj, encoding):
    if encoding not in encoders:
        raise LookupError 'encoding not found in encoders'
    # check if obj works with encoding to string
    # ...
    b = bytes(obj).recode(encoding)
    return str(b)

 encodings.tounicode(obj, decodeing):
    if decoding not in decoders:
        raise LookupError 'decoding not found in decoders'
    # check if obj works with decoding to unicode
    # ...
    b = bytes(obj).recode(decoding)
    return unicode(b)

Anyway... food for thought.

Cheers, Ronald Adam



More information about the Python-Dev mailing list