[Python-Dev] bytes.from_hex() (original) (raw)

James Y Knight foom at fuhm.net
Wed Feb 22 17:17:41 CET 2006


On Feb 22, 2006, at 6:35 AM, Greg Ewing wrote:

I'm thinking of convenience, too. Keep in mind that in Py3k, 'unicode' will be called 'str' (or something equally neutral like 'text') and you will rarely have to deal explicitly with unicode codings, this being done mostly for you by the I/O objects. So most of the time, using base64 will be just as convenient as it is today: base64encode(mybytes) and write the result out somewhere.

The reason I say it's corrrect is that if you go straight from bytes to bytes, you're assuming the eventual encoding is going to be an ascii superset. The programmer is going to have to know about this assumption and understand all its consequences and decide whether it's right, and if not, do something to change it. Whereas if the result is text, the right thing happens automatically whatever the ultimate encoding turns out to be. You can take the text from your base64 encoding, combine it with other text from any other source to form a complete mail message or xml document or whatever, and write it out through a file object that's using any unicode encoding at all, and the result will be correct.

This makes little sense for mail. You combine bytes, in various and
possibly different encodings to form a mail message. Some MIME
sections might have a base64 Content-Transfer-Encoding, others might
be 8bit encoded, others might be 7bit encoded, others might be quoted- printable encoded. Before the C-T-E encoding, you will have had to do
the Content-Type encoding, coverting your text into bytes with the
desired character encoding: utf-8, iso-8859-1, etc. Having the final
mail message be made up of "characters", right before transmission to
the socket would be crazy.

James



More information about the Python-Dev mailing list