(original) (raw)
On Oct 9, 2014, at 12:34 PM, Andrew Trick <atrick@apple.com> wrote:
On Oct 9, 2014, at 9:17 AM, Hal Finkel <hfinkel@anl.gov> wrote:----- Original Message -----From: "Vaidas Gasiunas" <vaidas.gasiunas@sap.com>
To: "LLVM Dev (llvmdev@cs.uiuc.edu)" <llvmdev@cs.uiuc.edu>
Sent: Thursday, October 9, 2014 4:37:39 AM
Subject: \[LLVMdev\] Performance regression in the LiveIntevals phase
Some time ago we reported a compile-time performance regression in
the LiveIntervals analysis pass (see
http://llvm.org/bugs/show\_bug.cgi?id=18580 ). We detected it at
first after migrating from LLVM 3.1 to 3.3, but the problem persists
also in 3.5\. This regression is especially critical when compiling
long functions. In one of our benchmarks compile time goes from 200s
(in 3.1) up to 1500s (in 3.3).
We also managed to reproduce the regression with a C++ example using
clang. Here are the instructions:
Generate the example C++ file with the Perl script attached to the
Bug (5000 is a good size parameter to clearly see the regression):
perl create.pl example.cpp 5000
Generate LLVM IR code for the C++ file (the regression is in the
backend functionality, so we don't want to measure frontend
compilation)
llvm31/clang++ example.cpp -c -emit-llvm -o example.ir-31
llvm35/clang++ example.cpp -c -emit-llvm -o example.ir-35
Now you can run the actual performance measurements using llc:
llvm31/llc -cppgen=program -o=5000.asm example.ir-31
llvm35/llc -cppgen=program -o=5000.asm example.ir-35
or you can also compile example.ir-31 with llvm3.5:
llvm35/llc -cppgen=program -o=5000.asm example.ir-31
Our analysis had shown that the most of time is taken for generation
of live intervals for physical registers. In the biggest example,
the live interval of one of the physical registers consists of about
500.000 live ranges. Insertions in the middle of the array of live
ranges cause quadratic algorithmic complexity, which is apparently
the main reason for the slow-down. I changed the implementation of
computeRegUnitInterval() so that it uses a set instead of the array,
and in this way managed to reduce the execution time of the
computeRegUnitInterval from 1200s down to about 1s. The fix is a bit
ugly, however, because I cannot completely switch to the set, since
further analyses are more efficient on the array. For that reason, I
flush the contents of the set into the array at the end of
computeRegUnitInterval. I also had to rewrite various operations on
the live interval so that they use the set structure if it is
available and the array otherwise.
If there is an interest, I can port the patch to the dev version of
llvm, but maybe someone has a better idea of how to deal with the
regression.
I think this would be interesting, please do. There are a few places in LLVM where we currently keep dual data structures, and needing another one is not necessarily bad (we have SetVector, for example).Yes, it would be good to get a fix for this behavior in LLVM trunk even if it isn�t pretty. I�ve also seen extendToUses showing up on profiles. Your description of the fix sounds reasonable.
+1\. Please send the patch for review.
Thanks,
-Quentin
-Andy\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev