[Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 (original) (raw)

[Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py ))

Guido van Rossum guido at python.org
Tue May 8 16:10:51 CEST 2007


On 5/8/07, Jason Orendorff <jason.orendorff at gmail.com> wrote:

On 5/7/07, Guido van Rossum <guido at python.org> wrote: > I don't know how this will work out yet. I'm not convinced that having > both mutable and immutable bytes is the right thing to do; but I'm > also not convinced of the opposite. I am slowly working on the > string/unicode unification, and so far, unfortunately, it is quite > daunting to get rid of 8-bit strings even at the Python level let > alone at the C level.

Guido, if 3.x had an immutable bytes type, could 2to3 provide a better guarantee? Namely, "Set your default encoding to None in your 2.x code today, and 2to3 will not introduce bugs around str/unicode."

I don't know. I may be able to tell you when I'm further into the process of unifying str and unicode.

2to3 could produce 3.x code that preserves the 2.x meaning by using 2.x-ish types, including immutable byte strings.

This sounds dangerously close to crippling 3.0 with backwards compatibility. I want to reserve this option as a last resort.

Without this, my understanding is that 2to3 will introduce bugs. Am I wrong?

No -- 2to3 cannot guarantee that your code will work correctly, because it doesn't do any data flow analysis or type inferencing. This is not limited to strings.

This might be worth doing even if you decide an immutable 8-bit type is wrong for the core language. The type could be hidden away in an "upgradelib" module somewhere. Surely people will prefer correctness over "producing nice, idiomatic 3.x code" in the 2to3 tool.

With that I agree, at least in general (e.g. d.keys() gets translated to list(d.keys()) and d.iterkeys(0 to iter(d.keys())). In the current py3k-struni branch I have temporarily kept the 8-bit string type around, renamed to str8. I am hoping I will be able to get rid of it eventually but I may not succeed and then we'll have it available as a backup.

For anyone who wants to discuss this more -- please come and help out in the py3k-struni branch first. It is simply too soon to be able to make decisions based on the evidence available so far, and I won't be forced.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list