[Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer) (original) (raw)

Guido van Rossum guido at python.org
Sat Sep 29 16:33:01 CEST 2007


On 9/29/07, Phillip J. Eby <pje at telecommunity.com> wrote:

At 08:08 PM 9/28/2007 -0700, Guido van Rossum wrote: >Likely, programmers will attempt to look up keys >that they know are in the dict -- and if they use the wrong type, >because of the identical hash values, they will get the TypeError as >soon as they compare it to the first object at the hashed location.

I'm coming into this thread a little bit late, but if we don't want strings and bytes to be comparable, shouldn't we just make them unequal? I mean, under normal circumstances, == and != are available on all objects without causing errors, and the same TypeError would occur for things like list.remove().

Until just before 3.0a1, they were unequal. We decided to raise TypeError because we noticed many bugs in code that was doing things like

data = f.read(4096) if data == "": break

where data was bytes and thus the break never taken. Similar with checks for certain magic strings (so it wasn't just empty strings).

It is also in line with the policy to refuse things like b"abc".replace("a", "A") or "abc".replace(b"b", b"B").

This seems a lot like Oleg's question on Python-Dev the other day, about raising a TypeError from nonzero: i.e., changing a significant expectation about all "normal" objects.

While it's true that it would be good to know when you've unintentionally mixed bytes and strings, surely there could be less fatal ways to find this, like perhaps a command-line option that causes byte/string comparisons to output a warning?

I thought about using warning too, but since nobody wants warnings, that would be pretty much the same as raising TypeError except for the most dedicated individuals (and if I were really dedicated I'd just write my own eq() function anyway). And the warning would do nothing about the issue brought up by Jim Jewett, the unpredictable behavior of a dict with both bytes and strings as keys.

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-3000 mailing list