[Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered (original) (raw)
Antoine Pitrou solipsis at pitrou.net
Thu Sep 15 12:13:54 EDT 2016
- Previous message (by thread): [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
- Next message (by thread): [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 15 Sep 2016 08:02:10 -0700 Raymond Hettinger <raymond.hettinger at gmail.com> wrote:
Eric is correct on this one. The consecutive hashes make a huge difference for Python 3.5. While there is a table full table scan, the check for NULL entries becomes a predictable branch when all the keys are in consecutive positions. There is an astonishingly well written stack overflow post that explains this effect clearly: http://stackoverflow.com/questions/11227809 With normal randomized keys, Python 3.6 loop is dramatically better that Python 3.5: [...]
You're jumping to conclusions. While there is a difference, there is no evidence that the difference is due to better branch prediction.
Actually, let's do a quick back-of-the-envelope calculation and show that it can't be attributed mostly to branch prediction:
~/py36 $ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 100 loops, best of 3: 12.3 msec per loop ~/py35 $ ./python.exe -m timeit -s "d=dict.fromkeys(map(str,range(10**6)))" "list(d)" 10 loops, best of 3: 54.7 msec per loop
For 10**6 elements, this is a 42ns difference per dict element.
A 2.6 Ghz Haswell doesn't stall for 42ns when there's a branch mispredict. According to the Internet, the branch mispredict penalty for a Haswell CPU is 15 cycles, which is 5.7ns at 2.6 GHz. Far from the observed 42ns.
42ns, however, is congruent with another possible effect: a main memory access following a last-level cache miss.
And indeed, Serhiy showed that this micro-benchmark is actually dependent on insertion order on Python 3.6:
$ ./python -m timeit -s "l = [str(i) for i in range(10**6)]; d=dict.fromkeys(l)" "list(d)" -> 100 loops, best of 3: 20 msec per loop
$ ./python -m timeit -s "import random; l = [str(i) for i in range(10**6)]; random.shuffle(l); d=dict.fromkeys(l)" "list(d)" -> 10 loops, best of 3: 55.8 msec per loop
The only way the table scan (without NULL checks, since this is Python 3.6) can be dependent on insertion order is because iterating the table elements needs to INCREF each element, that is: this benchmark doesn't only scan the table in a nice prefetcher-friendly linear sequence, it also accesses object headers at arbitrary places in Python's heap memory.
Since this micro-benchmark creates the keys in order just before filling the dict with them, randomizing the insertion order destroys the temporal locality of object header accesses when iterating over the dict keys. This looks like the right explanation, not branch mispredicts due to NULL checks.
This also shows that a micro-benchmark that merely looks ok can actually be a terrible proxy of actual performance.
As a further validation of this theory, let's dramatically decrease the working set size on the initial benchmark:
$ ./python -m timeit -s "d=dict.fromkeys(map(str,range(10**3)))" "list(d)"
-> Python 3.5: 100000 loops, best of 3: 10.9 usec per loop -> Python 3.6: 100000 loops, best of 3: 9.72 usec per loop
When the working set fits in the cache, this micro-benchmark is only 12% slower on 3.5 compared to 3.6. This much smaller difference (a mere 1.2ns difference per dict element) could be attributed to eliminating the NULL checks, or to any other streamlining of the core iteration logic.
Regards
Antoine.
- Previous message (by thread): [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
- Next message (by thread): [Python-Dev] Python 3.6 dict becomes compact and gets a private version; and keywords become ordered
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]