[Python-Dev] PEP 460 reboot (original) (raw)

Donald Stufft donald at stufft.io
Mon Jan 13 01:46:19 CET 2014

Previous message: [Python-Dev] PEP 460 reboot
Next message: [Python-Dev] PEP 460 reboot
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Jan 12, 2014, at 6:55 PM, Guido van Rossum <guido at python.org> wrote:

The key reason for introducing a separate bytes type in Python 3 is to avoid mixing bytes and text. This aims to avoid the classic Python 2 Unicode failure, where str+unicode fails or succeeds based on whether str contains non-ASCII characters or not, which means it is easy to miss in testing.

But this does not mean the bytes type isn't allowed to have a noticeable bias in favor of encodings that are ASCII supersets, even if not all bytes objects contain such data (e.g. image data, compressed data, binary network packets, and so on).

IMO it's totally fine and consistent if b'%d' % 42 returns b'42' and also for b'{}'.format(42) to return b'42'. There are numerous places where bytes are already assumed to use an ASCII superset: - byte literals: b'abc' (it's a syntax error to have a non-ASCII character here) - the upper() and lower() methods modify the ASCII letter positions - int(b'42') == 42, float(b'3.14') == 3.14

Completely Agree.

I looked through the example code I recently write for asyncio (which uses bytes for all data read or written). There are several places where I have to make a clumsy detour via text strings because I need to include an ASCII-encoded decimal integer (e.g. the Content-Length header) or a hex-encoded one (e.g. for Transfer-Encoding: chunked). Those detours aren't needed for parsing because int() accepts bytes just fine. I also note that the behavior of the re module is perfect: if the pattern is bytes, it can only match bytes and the extracted data is bytes, and ditto for text -- so it supports both types but doesn't allow mixing them. The urllib module does this too -- at considerable cost in its implementation, but it's the right thing, because there really are good cases to be made for treating URLs as text as well as for treating them as bytes (as with filenames, command line arguments, and environment variables). I'm sad that the json module in Python 3 doesn't support bytes at all, but at least it is consistent -- it always produces text in ASCII encoding (by default). The same applies to the http module, which IIUC adheres to the standard by treating headers as Latin-1. -- --Guido van Rossum (python.org/~guido)

Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io

Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://mail.python.org/pipermail/python-dev/attachments/20140112/2437cb4f/attachment.sig>

Previous message: [Python-Dev] PEP 460 reboot
Next message: [Python-Dev] PEP 460 reboot
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list