Message 151847 - Python tracker (original) (raw)
I'm attaching an attempt at backporting haypo's random-8.patch to 2.7
Changes relative to random-8.patch:
The randomization is off by default, and must be enabled by setting a new environment variable PYTHONHASHRANDOMIZATION to a non-empty string. (if so then, PYTHONHASHSEED also still works, if provided, in the same way as in haypo's patch)
All of the various "Py_hash_t" become "long" again (Py_hash_t was added in 3.2: )
I expanded the randomization from just PyUnicodeObject to also cover these types:
PyStringObject
PyBufferObject
The randomization does not cover numeric types: if we change the hash of int so that hash(i) no longer equals i, we also have to change it consistently for long, float, complex, decimal.Decimal and fractions.Fraction; however, there are 3rd-party numeric types that have their own hash implementation that mimics int.hash (see e.g. numpy)
As noted in http://bugs.python.org/issue13703#msg151063 and http://bugs.python.org/issue13703#msg151064, it's not possible to directly create a dict with integer keys via JSON or XML-RPC.
This seems like a tradeoff between the risk of attack via other means vs breakage induced by not having hash() == hash() for the various equivalent numerical representations in pre-existing code.
To support my expanded usage of the random secret, I moved:
PyAPI_DATA(_Py_unicode_hash_secret_t) _Py_unicode_hash_secret
from unicodeobject.h to object.h and renamed it to:
PyAPI_DATA(_Py_HashSecret_t) _Py_HashSecret;
This also exposes it for usage by C extension modules, just in case they need it (Murphy's Law suggests we will need if we don't expose it). This is an extension of the API, but warranted, I feel. My plan for downstream RHEL is to add this explicitly to the RPM metadata as a "Provides" of the RPM providing libpython.so so that if something needs to use it, it can express a "Requires" on it; I assume that something similar is possible with .deb)
generalized test_unicode.HashTest to support the new env var and the additional types. In my version, get_hash takes a _repr string rather than an object, so that I can test it with a buffer(). Arguably the tests should thus be moved from test_unicode to somewhere else, but this location keeps things consistent with haypo's patch.
haypo: in random-8.patch, within test_unicode.HashTest.test_null_hash, "hash_empty" seems to be misnamed
dropped various selftest fixes where the corresponding selftests don't exist in 2.7
adds a description of the new environment variables to the manpage; arguably this should be done for the patch for the default branch also
Caveats:
only tested on Linux (Fedora 15 x86_64); not tested on Windows. Tested via "make test" both with and without PYTHONHASHRANDOMIZATION=1
not yet benchmarked
Doc/using/cmdline.rst | 28 ++ Include/object.h | 7 Include/pythonrun.h | 2 Lib/lib-tk/test/test_ttk/test_functions.py | 2 Lib/os.py | 19 - Lib/test/mapping_tests.py | 2 Lib/test/regrtest.py | 5 Lib/test/test_gdb.py | 15 + Lib/test/test_inspect.py | 1 Lib/test/test_os.py | 47 +++- Lib/test/test_unicode.py | 55 +++++ Makefile.pre.in | 1 Misc/python.man | 22 ++ Modules/posixmodule.c | 126 ++---------- Objects/bufferobject.c | 8 Objects/object.c | 2 Objects/stringobject.c | 8 Objects/unicodeobject.c | 17 + PCbuild/pythoncore.vcproj | 4 Python/pythonrun.c | 2 b/Python/random.c | 284 +++++++++++++++++++++++++++++ 21 files changed, 510 insertions(+), 147 deletions(-)