[Python-Dev] PEP 460 reboot (original) (raw)
Guido van Rossum guido at python.org
Mon Jan 13 19:58:24 CET 2014
- Previous message: [Python-Dev] PEP 460 reboot
- Next message: [Python-Dev] PEP 460 reboot
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Let me try rebooting the reboot.
My interpretation of Nick's argument is that he are asking for a bytes formatting language that doesn't have an implicit ASCII assumption.
To me this feels absurd. The formatting codes (%s, %c) themselves are expressed as ASCII characters. If you include anything else in the format string besides formatting codes (e.g. b'<%s>'), you are giving it as ASCII characters. I don't know what characters the EBCDIC codes 37, 99 or 115 encode (these are the ASCII codes for '%', 'c', 's') but it certainly wouldn't be safe to use % when the LHS is EBCDIC-encoded.
If I had some byte strings in an unknown encoding (but the same encoding for all) that I needed to concatenate I would never think of '%s%s' % (x, y) -- I would write x+y. (Even in Python 2.)
If I see some code using any formatting operation (regardless of whether it's %d, %r, %s or %c) I am going to assume that there is some ASCII-ness, and if there isn't, the code's author has obscured their goal to me.
I hear the objections against b'%s' % 'x' returning b"'x'" loud and clear, and if the noise about that sub-issue is preventing folks from seeing the absurdity in PEP 460, we can talk about a compromise, e.g. use %b which would require its argument to be bytes. Those bytes should still probably be ASCII-ish, but there's no way to test that. That's fine with me and should be fine to Nick as well -- PEP 460 doesn't check that your encodings match (how could it? :-), nor does plain string concatenation using +.
In my head I make the following classification of situations where you work with bytes and/or text.
(A) Pure binary formats (e.g. most IP-level packet formats, media files, .pyc files, tar/zip files, compressed data, etc.). These are handled using the struct module (e.g. tar/zip) and/or custom C extensions (e.g. gzip).
(B) Encoded text. Here you should just decode everything into str objects and parse your text at that level. If you really want to manipulate the data as bytes (e.g. because you have a lot of data to process and very light processing) you may be able to do it, but unless it's a verbatim copy, you are probably going to make assumptions about the encoding. You are also probably going to mess up for some encodings (e.g. leave BOM turds in the middle of a file).
(C) Loosely text-based protocols and formats that have an ASCII assumption in the spec. Most classic Internet protocols (FTP, SMTP, HTTP, IRC, etc.) fall in this category; I expect there are also plenty of file formats using similar conventions (e.g. mailbox files). These protocols and formats often require text-ish manipulations, e.g. for case-insensitive headers or commands, or to split things at whitespace. This is where I find uses for the current ASCII-assuming bytes operations (e.g. b.lower(), b.split(), but also int(b)) and where the lack of number formatting (especially %d and %x) is most painful. I see no benefit in forcing the programmer writing such protocol code handling to use more cumbersome ways of converting between numbers and bytes, nor in forcing them to insert an encoding/decoding layer -- these protocols often switch between text and binary data at line boundaries, so the most basic part of parsing (splitting the input into lines) must still happen in the realm of bytes.
IMO PEP 460 and the mindset that goes with it don't apply to any of these three cases.
Also, IMO requiring a new type to handle (C) also seems adding too much complexity, and adds to porting efforts. I may have felt differently in the past, but ATM I feel that if newer versions of Python 3 make porting of Python 2 code easier, through minor compromises, that's a good thing. (Example: adding u"..." literals to 3.3.)
-- --Guido van Rossum (python.org/~guido)
- Previous message: [Python-Dev] PEP 460 reboot
- Next message: [Python-Dev] PEP 460 reboot
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]