[Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer) (original) (raw)

Jeffrey Yasskin jyasskin at gmail.com
Sat Sep 29 20:10:07 CEST 2007


On 9/29/07, Phillip J. Eby <pje at telecommunity.com> wrote:

At 07:33 AM 9/29/2007 -0700, Guido van Rossum wrote: >Until just before 3.0a1, they were unequal. We decided to raise >TypeError because we noticed many bugs in code that was doing things >like > > data = f.read(4096) > if data == "": break

Thought experiment: what if read() always returned strings, and to read bytes, you had to use something like 'f.readinto(ob, 4096)', where 'ob' is a mutable bytes instance or memory view? In Python 2.x, there's only one read() method because (prior to unicode), there was only one type of reading to do. But as the above example makes clear, in 3.x you simply can't write code that works correctly with an arbitrary file that might be binary or text, at least not without typechecking the return value from read(). (In which case, you might as well inspect the file object.) So, the above problem could be fixed by having .read() raise an error (or simply not exist) on a binary file object.

Perhaps write if len(data) == 0: break since that's what you really mean.

Any other code that compares the result of read() to either a bytes or a str really is taking a text or binary file object specifically and not working on an arbitrary file.

In this way, the problem is fixed at the point where it really occurs: i.e., at the point of not having decided whether the stream is bytes or text.

This also seems to fit better (IMO) with the best practice of enforcing str/unicode/encoding distinctions at the point where data enters the program, rather than delaying the error to later.

>I thought about using warning too, but since nobody wants warnings, >that would be pretty much the same as raising TypeError except for the >most dedicated individuals (and if I were really dedicated I'd just >write my own eq() function anyway). The use case I'm concerned about is code that's not type-specific getting a TypeError by comparing arbitrary objects. For example, if you write Python code to create a Python code object (e.g. the compiler package or my own BytecodeAssembler), you need to create a list of constants as you generate the code, and you need to be able to search the list for an equal constant. Since strings and bytes can both be constants, a simple list.index() test could now raise a TypeError, as could "item in list". So raising an error to make bad code fail sooner, will also take down unsuspecting code that isn't really broken, and force the writing of special comparison code -- which won't be usable with things like list.remove and the "in" operator. In comparison, forcing code to be bytes vs. text aware at the point of I/O directs attention to the place where you can best decide what to do about it. (After all, the comparison that raises the TypeError might occur deep in a library that's expecting to work with text.) >And the warning would do nothing >about the issue brought up by Jim Jewett, the unpredictable behavior >of a dict with both bytes and strings as keys. I've looked at all of Jim's messages for September, but I don't see this. I do see where raising TypeError for comparisons causes a problem with dictionaries, but I don't see how an unequal comparison creates "unpredictable" behavior (as opposed to predictable failure to match).


Python-3000 mailing list Python-3000 at python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/jyasskin%40gmail.com

-- Namasté, Jeffrey Yasskin http://jeffrey.yasskin.info/

"Religion is an improper response to the Divine." — "Skinny Legs and All", by Tom Robbins



More information about the Python-3000 mailing list