[Python-Dev] Counting collisions for the win (original) (raw)

Frank Sievertsen pydev at sievertsen.de
Mon Jan 23 09:53:06 CET 2012


Hello,

I'd still prefer to see a randomized hash()-function (at least for 3.3).

But to protect against the attacks it would be sufficient to use randomization for collision resolution in dicts (and sets).

What if we use a second (randomized) hash-function in case there are many collisions in ONE lookup. This hash-function is used only for collision resolution and is not cached.

The benefits:

The drawback:

The following code is meant for explanation purpose only:

for(perturb = hash; ; perturb >>= 5) { i = (i << 2) + i + perturb + 1;

 if((collisions++) == 20) {
     // perturb is already zero after 13 rounds.
     // 20 collisions are rare.

     // you can add && (ma_mask > 256) to make 100% sure
     // that it's not used for smaller dicts.

     if(Py_TYPE(key)->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH) {
         // If type has a randomized hash, use this now for lookup
         i = perturb = PyObject_RandomizedHash(key));
     }
.....

If I got this right we could add a new function "tp_randomized_hash" to 3.3 release. But can we also add functions in older releases, without breaking ABI?

If not, can we implement this somehow using a flag?

FOR OLDER RELEASE < 3.3:

Py_hash_t PyObject_RandomizedHash(PyVarObject *o) { PyTypeObject *tp = Py_TYPE(v); if(! (tp->tp_flags & Py_TPFLAGS_HAVE_RANDOMIZED_HASH)) return -1;

 global_flags_somewhere->USE_RANDOMIZED_HASH = 1;
 return (*tp->tp_hash)(v);

}

.... and in unicodeobject.c: (and wherever we need randomization)

static Py_hash_t unicode_hash(PyUnicodeObject *self) { Py_ssize_t len; Py_UNICODE *p; Py_hash_t x; Py_hash_t prefix=0; Py_hash_t suffix=0;

 if(global_flags_somewhere->USE_RANDOMIZED_HASH) {
     global_flags_somewhere->USE_RANDOMIZED_HASH = 0;
     initialize_rng_if_not_already_done_and_return_seed(&prefix, 

&suffix);

 ..... (and don't cache in this case) .....

It's ugly, but if I understand this correctly, the GIL will protect us against race-conditions, right?

Hello, internals experts: Would this work or is there a better way to do this without breaking the ABI?

Frank



More information about the Python-Dev mailing list