[Python-Dev] What should the focus for 2.6 be? (original) (raw)

Josiah Carlson jcarlson at uci.edu
Mon Aug 21 23:21:30 CEST 2006


Talin <talin at acm.org> wrote: [snip]

I've been thinking about the transition to unicode strings, and I want to put forward a notion that might allow the transition to be done gradually instead of all at once.

The idea would be to temporarily introduce a new name for 8-bit strings - let's call it "ascii". An "ascii" object would be exactly the same as today's 8-bit strings.

There are two parts to the unicode conversion; all literals are unicode, and we don't have strings anymore, we have bytes. Without offering the bytes object, then people can't really convert their code. String literals can be handled with the -U command line option (and perhaps having the interpreter do the str=unicode assignment during startup).

In any case, as I look at Py3k and the future of Python, in each release, I ask "what are the compelling features that make me want to upgrade?" In each of the 1.5-2.5 series that I've looked at, each has had some compelling feature or another that has basically required that I upgrade, or seriously consider upgrading (bugfixes for stuff that has bitten me, new syntax that I use, significant increases in speed, etc.) .

As we approach Py3k, I again ask, "what are the compelling features?" Wholesale breakage of anything that uses ascii strings as text or binary data? A completely changed IO stack (requiring re-learning of everything known about Python IO)? Dictionary .keys(), .values(), and .items() being their .iter*() equivalents (making it just about impossible to optimize for Py3k dictionary behavior now)?

I understand getting rid of the cruft, really I do (you should see some cruft I've been replacing lately). But some of that cruft is useful, or really, some of that cruft has no alternative currently, which will require significant rewrites of user code when Py3k is released. When everyone has to rewrite their code, they are going to ask, "Why don't I just stick with the maintenance 2.x? It's going to be maintained for a few more years yet, and I don't need to rewrite all of my disk IO, strings in dictionary code, etc. I will be right along with them (no offense intended to those currently working towards py3k).

I can code defensively against buffer-sturating DOS attacks with my socket code, but I can't code defensively to handle some (never mind all) of the changes and incompatabilities that Py3k will bring.

Here's my suggestion: every feature, syntax, etc., that is slated for Py3k, let us release bit by bit in the 2.x series. That lets the 2.x series evolve into the 3.x series in a somewhat more natural way than the currently proposed everything breaks. If it takes 1, 2, 3, or 10 more releases in the 2.x series to get to all of the 3.x features, great. At least people will have a chance to convert, or at least write correct code for the future.

Say 2.6 gets bytes and special factories (or a special encoding argument) for file/socket to return bytes instead of strings, and only accept bytes objects to .write() methods (unless an encoding on the file, etc., was previously given). Given these bytes objects, it may even make sense to offer the .readinto() method that Alex B has been asking for (which would make 3 built-in objects that could reasonably support readinto: bytes, array, mmap).

If the IO library is available for 2.6, toss that in there, or offer it in PyPI as an evolving library.

I would suggest pushing off the dict changes until 2.7 or later, as there are 340+ examples of dict.keys() in the Python 2.5b2 standard library, at least half of which are going to need to be changed to list(dict.keys()) or otherwise. The breakage in user code will likely be at least as substantial.

Those are just examples that come to mind now, but I'm sure there are others changes with similar issues.



More information about the Python-Dev mailing list