Gain up to 2% speed on Intel Silvermont & Haswell processors. by npaglieri · Pull Request #91 · google/gemmlowp (original) (raw)
This speedup is achieved by reordering SSE kernel instructions to lower contention on CPU execution units.
All instruction dependencies are preserved: this change shouldn't introduce any difference in behavior.
The overall code structure might however seem a bit less straightforward due to interleaved instruction sequences.
The metrics below result from averaging 100 single-threaded benchmark executions.
Silvermont
benchmark size | original GFlops/s | optimized GFlops/s | performance ratio | performance gain |
---|---|---|---|---|
10x10x10 | 1.172 | 1.183 | 1.009 | 0.94% |
20x20x20 | 3.193 | 3.205 | 1.004 | 0.38% |
30x30x30 | 4.297 | 4.308 | 1.003 | 0.26% |
40x40x40 | 5.633 | 5.658 | 1.004 | 0.44% |
50x50x50 | 5.909 | 5.952 | 1.007 | 0.73% |
60x60x60 | 7.853 | 7.897 | 1.006 | 0.56% |
64x256x147 | 10.170 | 10.330 | 1.016 | 1.57% |
100x100x1 | 1.236 | 1.237 | 1.001 | 0.08% |
100x100x100 | 9.017 | 9.117 | 1.011 | 1.11% |
100x1000x100 | 11.350 | 11.530 | 1.016 | 1.59% |
1000x1000x1 | 1.488 | 1.510 | 1.015 | 1.48% |
1000x1000x10 | 7.818 | 7.917 | 1.013 | 1.27% |
1000x1000x100 | 12.58 | 12.770 | 1.015 | 1.51% |
1000x1000x1000 | 13.100 | 13.390 | 1.022 | 2.21% |
average gain | 1.01% |
Haswell
benchmark size | original GFlops/s | optimized GFlops/s | performance ratio | performance gain |
---|---|---|---|---|
10x10x10 | 3.423 | 3.442 | 1.006 | 0.56% |
20x20x20 | 9.404 | 9.518 | 1.012 | 1.21% |
30x30x30 | 12.760 | 12.950 | 1.015 | 1.49% |
40x40x40 | 17.350 | 17.520 | 1.010 | 0.98% |
50x50x50 | 18.280 | 18.540 | 1.014 | 1.42% |
60x60x60 | 23.860 | 24.250 | 1.016 | 1.63% |
64x256x147 | 31.260 | 31.830 | 1.018 | 1.82% |
100x100x1 | 3.822 | 3.825 | 1.001 | 0.08% |
100x100x100 | 27.550 | 27.920 | 1.013 | 1.34% |
100x1000x100 | 34.910 | 35.530 | 1.018 | 1.78% |
1000x1000x1 | 4.958 | 5.016 | 1.012 | 1.17% |
1000x1000x10 | 24.950 | 25.310 | 1.014 | 1.44% |
1000x1000x100 | 38.900 | 39.590 | 1.018 | 1.77% |
1000x1000x1000 | 40.310 | 41.120 | 1.020 | 2.01% |
average gain | 1.34% |