[Python-Dev] Hashes in Python3.5 for tuples and frozensets (original) (raw)

Anthony Flury anthony.flury at btinternet.com
Thu May 17 10:15:59 EDT 2018


Chris, I entirely agree. The same questioner also asked about the fastest data type to use as a key in a dictionary; and which data structure is fastest. I get the impression the person is very into micro-optimization, without profiling their application. It seems every choice is made based on the speed of that operation; without consideration of how often that operation is used.

On 17/05/18 09:16, Chris Angelico wrote:

On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev <python-dev at python.org> wrote:

Victor, Thanks for the link, but to be honest it will just confuse people - neither the link or the related bpo entries state that the fix is only limited to strings. They simply talk about hash randomization - which in my opinion implies ALL hash algorithms; which is why I asked the question.

I am not sure how much should be exposed about the scope of security fixes but you can understand my (and other's) confusion. I am aware that applications shouldn't make assumptions about the value of any given hash value - apart from some simple assumptions based hash value equality (i.e. if two objects have different hash values they can't be the same value). The hash values of Python objects are calculated by the hash method, so arbitrary objects can do what they like, including degenerate algorithms such as: class X: def hash(self): return 7 Agreed - I should have said the default hash algorithm. Hashes for custom object are entirely application dependent. So it's impossible to randomize ALL hashes at the language level. Only str and bytes hashes are randomized, because they're the ones most likely to be exploitable - for instance, a web server will receive a query like "http://spam.example/target?a=1&b=2&c=3" and provide a dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always going to create string keys in its dictionaries (JSON objects). Do you know of any situation in which an attacker can provide the keys for a dict/set as integers? I was just asking the question - rather than critiquing the fault-fix. I am actually more concerned that the documentation relating to the fix doesn't make it clear that only strings have their hashes randomised.

/B//TW : // // //This question was prompted by a question on a social media platform about the whether hash values are transferable between across platforms. Everything I could find stated that after Python 3.3 ALL hash values were randomized - but that clearly isn't the case; and the original questioner identified that some hash values are randomized and other aren't.// / That's actually immaterial. Even if the hashes weren't actually randomized, you shouldn't be making assumptions about anything specific in the hash, save that within one Python process, two equal values will have equal hashes (and therefore two objects with unequal hashes will not be equal). Entirely agree - I was just trying to get to the bottom of the difference - especially considering that the documentation I could find implied that all hash algorithms had been randomized. //I did suggest strongly to the original questioner that relying on the same hash value across different platforms wasn't a clever solution - their original plan was to store hash values in a cross system database to enable quick retrieval of data (!!!). I did remind the OP that a hash value wasn't guaranteed to be unique anyway - and they might come across two different values with the same hash - and no way to distinguish between them if all they have is the hash. Hopefully their revised design will store the key, not the hash./ Uhh.... if you're using a database, let the database do the work of being a database. I don't know what this "cross system database" would be implemented in, but if it's a proper multi-user relational database engine like PostgreSQL, it's already going to have way better indexing than anything you'd do manually. I think there are WAY better solutions than worrying about Python's inbuilt hashing. Agreed If you MUST hash your data for sharing and storage, the easiest solution is to just use a cryptographic hash straight out of hashlib.py. As stated before - I think the original questioner was intent on micro optimizations - and they had hit on the idea that storing an integer would be quicker than storing as string - entirely ignoring both the practicality of trying to code all strings into a value (since hashes aren't guaranteed not to collide), and the issues of trying to reverse that translation once the stored key had been retrieved. ChrisA


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/anthony.flury%40btinternet.com

Thanks for your comments :-)

--

Anthony Flury email : Anthony.flury at btinternet.com Twitter : @TonyFlury <https://twitter.com/TonyFlury/>



More information about the Python-Dev mailing list