[Python-Dev] bytes.from_hex() (original) (raw)

Stephen J. Turnbull stephen at xemacs.org
Wed Feb 22 10:48:16 CET 2006

Previous message: [Python-Dev] bytes.from_hex()
Next message: [Python-Dev] bytes.from_hex()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

"Greg" == Greg Ewing <greg.ewing at canterbury.ac.nz> writes:

Greg> Stephen J. Turnbull wrote:

>> What I advocate for Python is to require that the standard
>> base64 codec be defined only on bytes, and always produce
>> bytes.

Greg> I don't understand that. It seems quite clear to me that
Greg> base64 encoding (in the general sense of encoding, not the
Greg> unicode sense) takes binary data (bytes) and produces
Greg> characters.

Base64 is a (family of) wire protocol(s). It's not clear to me that it makes sense to say that the alphabets used by "baseNN" encodings are composed of characters, but suppose we stipulate that.

Greg> So in Py3k the correct usage would be [bytes<->unicode].

IMHO, as a wire protocol, base64 simply doesn't care what Python's internal representation of characters is. I don't see any case for "correctness" here, only for convenience, both for programmers on the job and students in the classroom. We can choose the character set that works best for us. I think that's 8-bit US ASCII.

My belief is that bytes<->bytes is going to be the dominant use case, although I don't use binary representation in XML. However, AFAIK for on the wire use UTF-8 is strongly recommended for XML, and in that case it's also efficient to use bytes<->bytes for XML, since conversion of base64 bytes to UTF-8 characters is simply a matter of "Simon says, be UTF-8!"

And in the classroom, you're just going to confuse students by telling them that UTF-8 --[Unicode codec]--> Python string is decoding but UTF-8 --[base64 codec]--> Python string is encoding, when MAL is telling them that --> Python string is always decoding.

Sure, it all makes sense if you already know what's going on. But I have trouble remembering, especially in cases like UTF-8 vs UTF-16 where Perl and Python have opposite internal representations, and glibc has a third which isn't either. If base64 (and gzip, etc) are all considered bytes<->bytes, there just isn't an issue any more. The simple rule wins: to Python string is always decoding.

Why fight it when we can run away with efficiency gains?

(In the above, "Python string" means the unicode type, not str.)

-- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software.

Previous message: [Python-Dev] bytes.from_hex()
Next message: [Python-Dev] bytes.from_hex()
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list