[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)

M.-A. Lemburg mal at egenix.com
Tue Feb 14 00:03:35 CET 2006

Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Phillip J. Eby wrote:

Why not just have the constructor be:

bytes(initializer [,encoding]) Where initializer must be either an iterable of suitable integers, or a unicode/string object. If the latter (i.e., it's a basestring), the encoding argument would then be required. Then, there's no need for special codec support for the bytes type, since you call bytes on the thing to be encoded. And of course, no need for a 'b' literal. It'd be cruel and unusual punishment though to have to write bytes("abc", "Latin-1") I propose that the default encoding (for basestring instances) ought to be "ascii" just like everywhere else. (Meaning, it should really be the system default encoding, which defaults to "ascii" and is intentionally hard to change.) We're talking about Py3k here: "abc" will be a Unicode string, so why restrict the conversion to 7 bits when you can have 8 bits without any conversion problems ? Actually, I thought we were talking about adding bytes() in 2.5.

Then we'd need to make the "ascii" encoding assumption again, just like Guido proposed.

However, now that you've brought this up, it actually makes perfect sense to just use latin-1 as the effective encoding for both strings and unicode. In Python 2.x, strings are byte strings by definition, so it's only in 3.0 that an encoding would be required. And again, latin1 is a reasonable, roundtrippable default encoding.

It is. However, it's not a reasonable assumption of the default encoding since there are many encodings out there that special case the characters 0x80-0xFF, hence the choice of using ASCII as default encoding in Python.

The conversion from Unicode to bytes is different in this respect, since you are converting from a "bigger" type to a "smaller" one. Choosing latin-1 as default for this conversion would give you all 8 bits, instead of just 7 bits that ASCII provides.

So, it sounds like making the encoding default to latin-1 would be a reasonably safe approach in both 2.x and 3.x.

Reasonable for bytes(): yes. In general: no.

While we're at it: I'd suggest that we remove the auto-conversion from bytes to Unicode in Py3k and the default encoding along with it. In Py3k the standard lib will have to be Unicode compatible anyway and string parser markers like "s#" will have to go away as well, so there's not much need for this anymore. I thought all this was already in the plan for 3.0, but maybe I assume too much. :)

Wouldn't want to wait for Py4D :-)

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Feb 13 2006)

Python/Zope Consulting and Support ... http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ::::

Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list