[Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer) (original) (raw)

Guido van Rossum guido at python.org
Sat Sep 29 05:08:06 CEST 2007

Previous message: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer)
Next message: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 9/28/07, Terry Reedy <tjreedy at udel.edu> wrote:

"Guido van Rossum" <guido at python.org> wrote in message news:ca471dc20709281140q2ef95c2ap8bbc7b7d3d46ebc0 at mail.gmail.com... | | Well, if we wanted "x" and b"x" to compare unequal instead of raising | an exception, we could just define it that way (it was that way until | just before 3.0a1). But we're explicitly defining it to raise a | TypeError so as to catch buggy code. I think trying to fix dict lookup | so that it, and only it, treats this as unequal, would be adding too | many quirks. | | We could choose to kill the TypeError altogether. If we keep it, we | should consistently let it raise TypeError everywhere. | | The question is whether it's worth the effort to raise TypeError when | the potential exists that a certain hash sequence could raise this | TypeError. I'm less and less convinced -- after all, we're making the | exception only for bytes/str, not for other types that might raise | TypeError upon comparison. | | So, I think that after all this was a bad idea. Sorry.

If you mean making a special case exception for string/bytes equality test, I agree. Would a restricted key dict (say, rdict, in collections) solve the problem you are aiming at? import collections adict = rdict(str) bdict = rdict(bytes) Now any buggy insertions get caught.

That sounds like a completely different use case -- a typechecking dict.

The use case we started with is to catch programmers who accidentally mix str and bytes as dict keys -- those programmers aren't likely to have thought much about their key type, so they're not likely to go out of their way to use the rdict you propose above.

But here's a clever trick that might just do the job, without any extra effort: make it so that the hash() of a bytes string containing only ASCII bytes is the same as that of a text string containing only ASCII characters. Likely, programmers will attempt to look up keys that they know are in the dict -- and if they use the wrong type, because of the identical hash values, they will get the TypeError as soon as they compare it to the first object at the hashed location.

Even better, in the proposal we'll be reusing the old PyString type for the new immutable bytes type, and its hash already is equal to that of a PyUnicode object if they both contain the same ASCII bytes only. (This used to be by design in 2.x, and I maintained this property when I made PyUnicode's hash a lot faster.)

-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Previous message: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer)
Next message: [Python-3000] bytes and dicts (was: PEP 3137: Immutable Bytesand Mutable Buffer)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-3000 mailing list