Message 151961 - Python tracker (original) (raw)

Sorry; hit the wrong key... intended message below:

On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm <dmalcolm@redhat.com> added the comment:

[lots of good stuff]

hybrid-approach-dmalcolm-2012-01-25-001.patch

As per haypo's random-8.patch, a randomization seed is read at startup.

Why not wait until it is needed? I suspect a lot of scripts will never need it for any dict, so why add the overhead to startup?

Once a dict has transitioned to paranoid mode, it isn't using PyObject_Hash anymore, and thus isn't using cached object values

The alternative hashes could be stored in an id-keyed WeakKeyDictionary; that would handle at least the normal case of using exactly the same string for the lookup.

Note that if a paranoid dict goes small again (ma_table == ma_smalltable), it stays paranoid.

As I read it, that couldn't happen, because paranoid dicts couldn't shrink at all. (Not letting them shrink beneath 2*PyDict_MINSIZE does seem like a reasonable solution.)

Additional TODOs...

The checks for Unicode and Dict should not be exact; it is OK to do on a subclass so long as they are using the same lookdict (and, for unicode, the same eq).

Additional small strings should be excluded from the new hash, to avoid giving away the secret. At a minimum, single-char strings should be excluded, and I would prefer to exclude all strings of length <= N (where N defaults to 4).