[Python-Dev] What to do for bytes in 2.6? (original) (raw)

Guido van Rossum guido at python.org
Fri Jan 18 05:43:47 CET 2008

Previous message: [Python-Dev] What to do for bytes in 2.6?
Next message: [Python-Dev] What to do for bytes in 2.6?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Jan 17, 2008 7:11 PM, Raymond Hettinger <python at rcn.com> wrote:

> If we provide some kind of "backport" of > bytes (even if it's just an alias for or trivial > subclass of str), it should be part of a strategy > that makes it easier to write code that > runs under 2.6 and can be automatically translated > to run under 3.0 with the same semantics.

If it's just an alias or trivial subclass, then we haven't added anything that can't be done trivially by the 2-to-3 tool.

I suggest you study how the 2to3 tool actually works before asserting this.

Consider the following function.

def stuff(blah): foo = "" while True: bar = blah.read(1024) if bar == "": break foo += bar return foo

Is it reading text or binary data from stream blah? We can't tell. If it's meant to be reading text, 2to3 should leave it alone. But if it's meant to be reading binary data, 2to3 should change the string literals to bytes literals (b"" in this case). (If it's used for both, there's no hope.) As it stands, 2to3 hasn't a chance to decide what to do, so it will leave it alone -- but the "translated" code will be wrong if it was meant to be reading bytes.

However, if the two empty string literals were changed to b"", we would know it was reading bytes. 2to3 could leave it alone, but at least the untranslated code would be correct for 2.6 and the translated code would be correct for 3.0.

This may seem trivial (because we do all the work, and 2to3 just leaves stuff alone), but having b"" and bytes as aliases for "" and str in 2.6 would mean that we could write 2.6 code that correctly expresses the use of binary data -- and we could use u"" and unicode for code using text, and 2to3 would translate those to "" and str and the code would be correct 3.0 text processing code.

Note that we really can't make 2to3 assume that all uses of str and "" are referring to binary data -- that would mistranslate the vast majority of code that does non-Unicode-aware text processing, which I estimate is the majority of small and mid-size programs.

I'm thinking that this is a deeper change. It doesn't serve either 2.6 or 3.0 to conflate str/unicode model with the bytes/text model. Mixing the two in one place just creates a mess in that one place.

I'm sure we're thinking that this is just an optional transition tool, but the reality is that once people write 2.6 tools that use the new model, then 2.6 users are forced to deal with that model. It stops being optional or something in the future, it becomes a mental jump that needs to be made now (while still retaining the previous model in mind for all the rest of the code).

This may be true. But still, 2.6 will run 2.5 code without any effort, so we will be able to mix modules using the 2.5 style and modules using the 3.0 style (or at least some aspects of 3.0 style) in one interpreter. Neither 2.5 nor 3.0 will support this combination. That's why 2.6 is so important it's a stepping stone.

I don't think you need a case study to forsee that it will be unpleasant to work with a code base that commingles the two world views.

Well, you shouldn't commingle the two world view in a single module or package. But that would just be bad style -- you shouldn't use competing style rules within a package either (like using words_with_underscores and camelCaseWords for method names).

One other thought. I'm guessing that apps that would care about the distinction are already using unicode and are already treating text as distinct from arrays of bytes.

Yes, but 99% of these still accept str instances in positions where they require text. The problem is that the str type and its literals are ambiguous -- their use is not enough to be able to guess whether text or data is meant. Just being able to (voluntarily! on a per-module basis!) use a different type name and literal style for data could help forward-looking programmers get started on making the distinction clear, thus getting ready for 3.0 without making the jump just yet (or maintaining a 2.6 and a 3.0 version of the same package easily, using 2to3 to automatically generate the 3.0 version from the 2.6 code base).

Instead, it's backwards thinking 20th-century neanderthal ascii-bound folks like myself who are going to have transition issues. It would be nice for us knuckle-draggers to not have to face the issue until 3.0.

Oh, you won't. Just don't use the -3 command-line flag and don't put "from future import " at the top of your modules, and you won't have to change your ways at all. You can continue to distribute your packages in 2.5 syntax that will also work with 2.6, and your users will be happy (as long as they don't want to use your code on 3.0 -- but if you want to give them that, that is when you will finally be forced to face the issue. :-)

Note that I believe that the -3 flag should not change semantics -- it should only add warnings. Semantic changes must either be backwards compatible or be requested explicitly with a forward import (which 2to3 can remove).

-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Previous message: [Python-Dev] What to do for bytes in 2.6?
Next message: [Python-Dev] What to do for bytes in 2.6?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list