[Python-Dev] What to do for bytes in 2.6? (original) (raw)
Guido van Rossum guido at python.org
Sun Jan 20 05:26:43 CET 2008
- Previous message: [Python-Dev] What to do for bytes in 2.6?
- Next message: [Python-Dev] What to do for bytes in 2.6?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Jan 19, 2008 5:54 PM, <glyph at divmod.com> wrote:
On 19 Jan, 07:32 pm, guido at python.org wrote: >There is no way to know whether that return value means text or data >(plenty of apps legitimately read text straight off a socket in 2.x),
IMHO, this is a stretch of the word "legitimately" ;-). If you're reading from a socket, what you're getting are bytes, whether they're represented by str() or bytes(); correct code in 2.x must currently do a .decode("ascii") or .decode("charmap") to "legitimately" identify the result as text of some kind. Now, ad-hoc code with a fast and loose definition of "text" can still read arrays of bytes off a socket without specifying an encoding and get away with it, but that's because Python's unicode implementation has thus far been very forgiving, not because the data is cleanly text yet.
I would say that depends on the application, and on arrangements that client and server may have made off-line about the encoding.
In 2.x, text can legitimately be represented as str -- there's even the locale module to further specify how it is to be interpreted as characters.
Sure, this doesn't work for full unicode, and it doesn't work for all protocols used with sockets, but claiming that only fast and loose code ever uses str to represent text is quite far from reality -- this would be saying that the locale module is only for quick and dirty code, which just ain't so.
Why can't we get that warning in -3 mode just the same from something read from a socket and a b"" literal?
If you really want this, please think through all the consequences, and report back here. While I have a hunch that it'll end up giving too many false positives and at the same time too many false negatives, perhaps I haven't thought it through enough. But if you really think this'll be important for you, I hope you'll be willing to do at least some of the thinking.
I believe that a constraint should be that by default (without -3 or a future import) str and bytes should be the same thing. Or, another way of looking at this, reads from binary files and reads from sockets (and other similar things, like ctypes and mmap and the struct module, for example) should return str instances, not instances of a str subclass by default -- IMO returning a subclass is bound to break too much code. (Remember that there is still lots of code out there that uses "type(x) is types.StringType)" rather than "isinstance(x, str)", and while I'd be happy to warn about that in -3 mode if we could, I think it's unacceptable to break that in the default environment -- let it break in 3.0 instead.)
I've written lots of code that aggressively rejects str() instances as text, as well as unicode instances as bytes, and that's in code that still supports 2.3 ;).
Yeah, well, but remember, while keeping you happy is high on my list of priorities, it's not the only priority. :-)
>Really, the pure aliasing solution is just about optimal in terms of >bang per buck. :-)
Not that I'm particularly opposed to the aliasing solution, either. It would still allow writing code that was perfectly useful in 2.6 as well as 3.0, and it would avoid disturbing code that did checks of type("").
Right.
It would just remove an opportunity to get one potentially helpful warning.
I worry that the warning wouldn't come often enough, and that too often it would be unhelpful. There will inevitably be some stuff where you just have to try to convert the code using 2to3 and try to run it under 3.0 in order to see if it works. And there's also the concern of those who want to use 2.6 because it offers 2.5 compatibility plus a fair number of new features, but who aren't interested (yet) in moving up to 3.0. I expect that Google will initially be in this category too.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] What to do for bytes in 2.6?
- Next message: [Python-Dev] What to do for bytes in 2.6?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]