[Python-Dev] Why does base64 return bytes? (original) (raw)

Paul Sokolovsky pmiscml at gmail.com
Tue Jun 14 14:17:16 EDT 2016


Hello,

On Tue, 14 Jun 2016 14:02:02 -0400 Random832 <random832 at fastmail.com> wrote:

On Tue, Jun 14, 2016, at 13:19, Paul Sokolovsky wrote: > Well, it's easy to remember the conclusion - it was decided to > return bytes. The reason also wouldn't be hard to imagine - > regardless of the fact that base64 uses ASCII codes for digits and > letters, it's still essentially a binary data.

Only in the sense that all text is binary data. There's nothing in the definition of base64 specifying ASCII codes. It specifies characters that all happen to be in ASCII's character repertoire. >And the most natural step for it is to send > it down the socket (socket.send() accepts bytes), etc. How is that more natural than to send it to a text buffer that is

It's more natural because it's more efficient. It's more natural in the same sense that the most natural way to get from point A to point B is a straight line.

ultimately encoded (maybe not even in an ASCII-compatible encoding... though probably) and sent down a socket or written to a file by a layer that is outside your control? Yes, everything eventually ends up as bytes. That doesn't mean that we should obsessively convert things to bytes as early as possible.

It's vice-versa - there's no need to obsessively convert simple, primary type of bytes (everything in computers are bytes) to more complex things like Unicode strings.

I mean if we were gonna do that why bother even having a unicode string type at all?

You're trying to raise the topic which is a subject of gigantic flame wars on python-list for years. Here's my summary: not using unicode string type at all is better than not using bytes type at all. So, feel free to use unicode string only when it's needed, which is only when you accept input from or produce output for human (like real human, walking down a street to do grocery shopping). In all other cases, data should stay bytes (mind - stay, as it's bytes in the beginning, and it requires extra effort to convert it to a strings).

> I'd find it a bit more surprising that binascii.hexlify() returns > bytes, but I personally got used to it, and consider it a > consistency thing on binascii module. > > Generally, with Python3 by default using (inefficient) Unicode for > strings,

Why is it inefficient?

Because bytes is the most efficient basic representation of data. Everything which tries to convert it to something is less efficient in general. Less efficient == inefficient.

-- Best regards, Paul mailto:pmiscml at gmail.com



More information about the Python-Dev mailing list