RFR: 8071571: Move substring of same string to slow path (original) (raw)
Vitaly Davidovich vitalyd at gmail.com
Wed May 13 23:26:48 UTC 2015
- Previous message: RFR: 8071571: Move substring of same string to slow path
- Next message: RFR: 8071571: Move substring of same string to slow path
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Need JIT generated assembly, not bytecode :). That will tell you at least which optimizations JIT applied, how it register allocated things, etc. If nothing obvious there, see my other reply regarding cpu event based profiling. I'm sure Aleksey Shipilev could help out if you're really inclined to figure this out.
sent from my phone On May 13, 2015 7:23 PM, "Ivan Gerasimov" <ivan.gerasimov at oracle.com> wrote:
On 14.05.2015 2:06, Vitaly Davidovich wrote: Why not look at the generated asm and not guess? :) The branch avoiding versions may cause data dependence hazards whereas the branchy one just has branches but assuming perfectly predicted (and microbenchmarks typically are) can pipeline through. Ivan, could you please post the asm here? Assuming you guys are interested in investigating this further. Sure, here they are: void substring1(int, int, char[]); Code: 0: iload1 1: iflt 15 4: iload2 5: aload3 6: arraylength 7: ificmpgt 15 10: iload1 11: iload2 12: ificmple 23 15: new #4 // class java/lang/Error 18: dup 19: invokespecial #5 // Method java/lang/Error."":()V 22: athrow 23: return void substring2(int, int, char[]); Code: 0: iload1 1: aload3 2: arraylength 3: iload2 4: isub 5: ior 6: iload2 7: iload1 8: isub 9: ior 10: ifge 21 13: new #4 // class java/lang/Error 16: dup 17: invokespecial #5 // Method java/lang/Error."":()V 20: athrow 21: return void substring3(int, int, char[]); Code: 0: iload1 1: aload3 2: arraylength 3: iload2 4: isub 5: ior 6: iflt 18 9: iload2 10: iload1 11: isub 12: dup 13: istore 4 15: ifge 26 18: new #4 // class java/lang/Error 21: dup 22: invokespecial #5 // Method java/lang/Error."":()V 25: athrow 26: return Sincerely yours, Ivan sent from my phone On May 13, 2015 6:51 PM, "Martin Buchholz" <martinrb at google.com> wrote: On Wed, May 13, 2015 at 2:25 PM, Ivan Gerasimov <_ _ivan.gerasimov at oracle.com> wrote:
> > Benchmark Mode Cnt Score Error Units > MyBenchmark.testMethod1 thrpt 60 1132911599.680 ± 42375177.640 ops/s > MyBenchmark.testMethod2 thrpt 60 813737659.576 ± 14226427.823 ops/s > MyBenchmark.testMethod3 thrpt 60 810406621.145 ± 12316864.045 ops/s > > The plain old ||-combined check was faster in this round. > Some other tests showed different results. > The speed seems to depend on the scope of the checked variables and > complexity of the expressions to calculate. > However, I still don't have a clear understanding of all the aspects we > need to pay attention to when doing such optimizations. > I'm not sure, but the only thing that could explain such a huge performance gap is that hotspot was able to determine at jit time that some of the comparisons did not need to be performed at all. If true, is this cheating or not? (you could retry with -Xint) One of the ideas is to separate hot and cold code (hotspot does not yet split code inside a single method) so that hotspot is more likely to inline, so that hotspot is more likely to optimize, and optimizing beginIndex < 0 away entirely is much easier than my more complex expression. So yeah, I could be persuaded that keeping beginIndex < 0 as an independent expression likely to be eliminated. Micro-optimizing is hard, but for the very core of the platform, important (more than readability). One of these days I have to learn how to write a jmh benchmark.
- Previous message: RFR: 8071571: Move substring of same string to slow path
- Next message: RFR: 8071571: Move substring of same string to slow path
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]