RFR: 8046943: RSA Acceleration (original) (raw)
Andrew Haley aph at redhat.com
Fri Jun 12 11:06:22 UTC 2015
- Previous message: RFR: 8077279: assert(ic->is_clean()) failed: IC should be clean
- Next message: RFR: 8046943: RSA Acceleration
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
http://cr.openjdk.java.net/~aph/8046943-hs-1/ http://cr.openjdk.java.net/~aph/8046943-jdk-1/
Before:
rsa 512 bits 0.000134s 0.000009s 7444.2 111853.3 rsa 1024 bits 0.000674s 0.000029s 1483.9 34456.9 rsa 2048 bits 0.003945s 0.000100s 253.5 9994.7 rsa 4096 bits 0.027015s 0.000373s 37.0 2681.6
After:
rsa 512 bits 0.000059s 0.000004s 17022.3 224141.1 rsa 1024 bits 0.000258s 0.000013s 3871.5 78851.0 rsa 2048 bits 0.001506s 0.000042s 664.1 23844.3 rsa 4096 bits 0.010214s 0.000153s 97.9 6516.0
There are some issues we need to discuss.
The runtime code is in sharedRuntime_x86_64.cpp even though it's mostly written in C. My thinking here is that porters will want to tweak the code for their back ends, or maybe write it in assembler. It's little-endian and would need some reworking for big-endian machines. But maybe there should be a common version of the code in share/ ?
Should it be in optoRuntime instead? It could be called from C1 or even the interpreter, but it's C2-only right now.
I've done nothing about 32-bit targets, but I think they would benefit.
I had to make some small changes to the Java library.
Montgomery multiplication works on whole words. Given that this is a 64-bit implementation I had to force the size of the arrays in oddModPow to be a multiple of 64 bits, i.e. the length of the arrays must be even. Given that RSA and Diffie-Hellman keys are usually a multiple of 64 bits in length I don't think this will be a real performance issue in practice.
I needed a 64-bit inverse rather than a 32-bit inverse. This is a single computation, done once for each modular exponentiation, so it makes an immeasurably small difference to the total runtime.
I fused squaring and multiplication into a single montgomeryMultiply method. This has two advantages. Firstly, we only need a single intrinsic, and secondly the decision whether to use multiply or squaring can be made in the runtime library. If the target does not support the montgomeryMultiply intrinsic there is no slowdown when using C2 because it removes the if (a == b) test in
if (a == b) { product = squareToLen(a, len, product); } else { product = multiplyToLen(a, len, b, len, product); }
Andrew.
- Previous message: RFR: 8077279: assert(ic->is_clean()) failed: IC should be clean
- Next message: RFR: 8046943: RSA Acceleration
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the hotspot-compiler-dev mailing list