[Python-Dev] str object going in Py3K (original) (raw)

Bengt Richter bokr at oz.net
Fri Feb 17 00:15:04 CET 2006

Previous message: [Python-Dev] str object going in Py3K
Next message: [Python-Dev] str.translate vs unicode.translate (was: Re: str object going in Py3K)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 15 Feb 2006 21:59:55 -0800, Alex Martelli <aleaxit at gmail.com> wrote:

On Feb 15, 2006, at 9:51 AM, Barry Warsaw wrote:

On Wed, 2006-02-15 at 09:17 -0800, Guido van Rossum wrote:

Regarding open vs. opentext, I'm still not sure. I don't want to generalize from the openbytes precedent to openstr or openunicode (especially since the former is wrong in 2.x and the latter is wrong in 3.0). I'm tempting to hold out for open() since it's most compatible. If we go with two functions, I'd much rather hang them off of the file type object then add two new builtins. I really do think file.bytes() and file.text() (a.k.a. open.bytes() and open.text()) is better than opentext() or openbytes(). I agree, or, MAL's idea of bytes.open() and unicode.open() is also good. My fondest dream is that we do NOT have an 'open' builtin which has proven to be very error-prone when used in Windows by newbies (as evidenced by beginner errors as seen on c.l.py, the python-help lists, and other venues) -- defaulting 'open' to text is errorprone, defaulting it to binary doesn't seem the greatest idea either, principle "when in doubt, resist the temptation to guess" strongly suggests not having 'open' as a built-in at all. (And namemangling into openthis and openthat seems less Pythonic to me than exploiting namespaces by making structured names, either this.open and that.open or open.this and open.that). IOW, I entirely agree with Barry and Marc Andre. FWIW, I'd vote for file.text and file.bytes

I don't like bytes.open or unicode.open because I think types in general should not know about I/O (IIRC Guido said that, so pay attention ;-) Especially unicode.

E.g., why should unicode pull in a whole wad of I/O-related code if the user is only using it as intermediary in some encoding change between low level binary input and low level binary output? E.g., consider what you could do with one statement like (untested)

s_str.translate(table, delch).encode('utf-8')

especially if you didn't have to introduce a phony latin-1 decoding and write it as (untested)

s_str.translate(table, delch).decode('latin-1').encode('utf-8')     # use str.translate

or s_str.decode('latin-1').translate(mapping).encode('utf-8') # use unicode.translate also for delch

to avoid exceptions if you have non-ascii in your s_str translation

It seems s_str.translate(table, delchars) wants to convert the s_str to unicode if table is unicode, and then use unicode.translate (which bombs on delchars!) instead of just effectively defining str.translate as

def translate(self, table, deletechars=None):
    return ''.join(table[ord(x)] for x in self
                   if deletechars is None or x not in deletechars)

IMO, if you want unicode.translate, then write unicode(s_str).translate and use that. Let str.translate just use the str ords, so simple custom decodes can be written without the annoyance of

UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 3: ordinal not in range(128)

Can we change this? Or what am I missing? I certainly would like to miss the above message for str.translate :-(

BTW This would also allow taking advantage of features of both translates if desired, e.g. by s_str.translate(unichartable256, strdelchrs).translate(uniord_to_ustr_mapping). (e.g., the latter permits single to multiple-character substitution)

This makes me think a translate method for bytes would be good for py3k (on topic ;-) It it is just too handy a high speed conversion goodie to forgo IMO.

BTW, ISTM that it would be nice to have a chunking-iterator-wrapper-returning-method (as opposed to buffering specification) for file.bytes, so you could plug in

file.bytes('path').chunk(1)  # maybe keyword opts for simple common record chunking also?

in places where you might now have to have (untested)

(ord(x) for x in iter(lambda f=open('path','rb'):f.read(1)) if x)

or write a helper like def by_byte_ords(path, bufsize=8192): f = open(path, 'rb') buf = f.read(bufsize) while buf: for x in buf: yield ord(x) buf = f.read(bufsize) and plug in by_byte_ords(path)

BTW, bytes([]) would presumably be the file.bytes EOF?

Regards, Bengt Richter

Previous message: [Python-Dev] str object going in Py3K
Next message: [Python-Dev] str.translate vs unicode.translate (was: Re: str object going in Py3K)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list