(original) (raw)
I have 3 issues with bloaty 1) It's designed with a "give all information possible" philosophy not a "give only actionable information (preferably with reasons)" 2) there is (currently) no way to restrict the information you get to the allocated case 3) It hasn't been scaling well on hundreds of binaries. Point 2 is easily fixable. Point 3 is most likely fixable and the features to drive bloaty exist I suppose currently we're driving individual instances of bloaty ourselves and this is both faster and gives us more information than we get using config files and csv files. Point 1 seems to be a real killer and a major point this proposal seeks to address. If we can work out a good proposal to add the functionality mentioned in point 1 I'd be happy to support that as well.
I think an external tool could do this just as well. My bias is for it to be in llvm however because it makes it easier to distribute since we already have testing, CI, a distribution figured out for that. That's not a good technical reason I grant you.
I think an external tool could do this just as well. My bias is for it to be in llvm however because it makes it easier to distribute since we already have testing, CI, a distribution figured out for that. That's not a good technical reason I grant you.
On Mon, Oct 1, 2018 at 3:26 PM David Blaikie via llvm-dev <llvm-dev@lists.llvm.org> wrote:
On Mon, Oct 1, 2018 at 3:24 PM JF Bastien <jfbastien@apple.com> wrote:On Oct 1, 2018, at 3:16 PM, David Blaikie <dblaikie@gmail.com> wrote:(my vote, somewhat biased - is that I'd love to see more investment in Bloaty (to keep all these sort of size analysis tools and tricks in one place), but sort of accept folks are probably going to keep building more infrastructure for this sort of thing in LLVM directly)I get where that comes from, but it seems a bit like a Valgrind versus sanitizer argument: integrating with the toolchain gives you things you can’t really get otherwise. Valgrind is still great as a self-standing thing.
Not sure that's quite the same though - with sanitizer integrating with the optimizers is the key here.
With bloaty - it could, at worst, use LLVM's libDebugInfo as a library to implement the more advanced debug-using features without being less functional than an in-LLVM implementation.
- Dave
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_On Wed, Sep 26, 2018 at 12:03 PM Vedant Kumar <vsk@apple.com> wrote:Hello,
I worked on a code size analysis tool for a 'week of code' project and think
that it might be useful enough to upstream.
The tool is inspired by bloaty (https://github.com/google/bloaty), but tries to
do more to attribute code size in actionable ways.
For example, it can calculate how many bytes inlined instances of a function
added to a binary. In its diff mode, it can show how much more aggressively a
function was inlined compared to a baseline. This can be useful when you're,
say, trying to figure out why firmware compiled by a new compiler is just a few
bytes over the size limit imposed by your embedded device :). In this case,
extra information about inlining can help inform a decision to either tweak the
inliner's cost model or to judiciously add a few \`noinline\` attributes. (Note
that if you're willing to recompile & write a few SQL queries, optimization
remarks can give you similar information, albeit at the IR level.)
As another example, this code size tool can attribute code size to semantically
interesting groups of code, like C++/Swift classes, or files. In the diff mode,
you can see how the code size of a class/file grew compared to a baseline. The
tool understands inheritance, so you can also see interesting high-level trends.
E.g \`clang::Sema\` grew more than \`llvm::Pass\` between clang-6 and clang-7.
Unlike bloaty, this tool focuses exclusively on the text segment. Also unlike
bloaty, it uses LLVM's DWARF parser instead of rolling its own. The tool is
currently implemented as a sub-tool of llvm-dwarfdump.
To get size information about a program, you do:
llvm-dwarfdump size-info -baseline -stats-dir
This emits four \*.stats files into , each containing a distinct 'view' into
the code groups in . There's a file view, a function view, a class view,
and an inlining view. Each view is sorted by code size, so you can see the
largest functions/classes/etc immediately.
The \*.stats files are just human-readable text files. As it happens, they use
the flamegraph format (http://brendangregg.com/flamegraphs.html). This makes it
easy to visualize any view as a flamegraph. (If you haven't seen one before,
it's a hierarchical visualization where the width of each entry corresponds to
its frequency (or in this case size).)
To look at code growth between two programs, you'd do:
llvm-dwarfdump size-info -baseline -target -stats-dir
Similarly, this emits four 'view' files into , but with a \*.diffstats
suffix. The format is the same.
Pending Work
\------------
I think the main piece of work the tool needs is better testing. Currently
there's just a single end-to-end test in clang. It might be better to check in
a few binaries so we can check that the tool reports sizes correctly.
Also, it may turn out that folks are interested in different ways of visualizing
size data. While the textual format of flamegraphs is really convenient for
humans to read, the graphs themselves do make more sense when the underlying
data have a frequentist interpretation. If there's enough interest I can explore
using an alternative format for visualization, e.g:
http://neugierig.org/software/chromium/bloat/
https://github.com/evmar/webtreemap
(Thanks JF for pointing these out!)
Here's a link to the source code:
https://github.com/vedantk/llvm-project/tree/sizeinfo
Selected Examples
\-----------------
Here are a few interesting snippets from a comparison of clang-6 vs. clang-7.
First, let's take a look at the function view diffstat. Here are the 10
functions which grew in size the most. On the left hand side, you'll see the
demangled function name. The \*change\* in code size in bytes is reported on the
right hand side (only positive changes are reported).
clang::Sema::CheckHexagonBuiltinCpu(\[snip\]) \[function\] 170316
ProcessDeclAttribute(\[snip\]) \[function\] 125893
llvm::AArch64InstPrinter::printAliasInstr(\[snip\]) \[function\] 105133
llvm::AArch64AppleInstPrinter::printAliasInstr(\[snip\]) \[function\] 105133
ParseCodeGenArgs(\[snip\]) \[function\] 64692
unswitchNontrivialInvariants(\[snip\]) \[function\] 40180
getAttrKind(\[snip\]) \[function\] 35811
clang::DumpCompilerOptionsAction::ExecuteAction() \[function\] 32417
llvm::UpgradeIntrinsicCall(\[snip\]) \[function\] 30239
bool llvm::InstructionSelector::executeMatchTable<(anonymous namespace)::ARMInstructionSelector const, \[snip\]) const \[function\] 29352
Next, let's look at the file view diffstat. This can be useful because it goes
beyond simply identifying the files which grew the most. It actually describes
which \*functions\* grew the most in those files, creating more opportunites to
do something about the code growth.
lib/Target/X86/X86ISelLowering.cpp \[file\];combineX86ShuffleChain(\[snip\]) \[function\] 24864
lib/Target/X86/X86ISelLowering.cpp \[file\];combineMul(\[snip\]) \[function\] 14907
lib/Target/X86/X86ISelLowering.cpp \[file\];combineStore(\[snip\]) \[function\] 12220
...
tools/clang/lib/Sema/SemaExpr.cpp \[file\];clang::Sema::CheckCompareOperands(\[snip\]) \[function\] 16024
tools/clang/lib/Sema/SemaExpr.cpp \[file\];diagnoseTautologicalComparison(\[snip\]) \[function\] 1740
tools/clang/lib/Sema/SemaExpr.cpp \[file\];clang::Sema::ActOnNumericConstant(\[snip\]) \[function\] 1436
tools/clang/lib/Sema/SemaExpr.cpp \[file\];checkThreeWayNarrowingConversion(\[snip\]) \[function\] 1356
tools/clang/lib/Sema/SemaExpr.cpp \[file\];CheckIdentityFieldAssignment(\[snip\]) \[function\] 1280
The class view diffstat is a bit different because it has more levels of
nesting than the other views, due to inheritance. This might help give a sense
for the high-level changes in a program, but may also be less actionable.
clang::Sema \[class\];clang::Sema::CheckHexagonBuiltinCpu(\[snip\]) \[function\] 170316
clang::Sema \[class\];clang::Sema::CheckHexagonBuiltinArgument(\[snip\]) \[function\] 24156
clang::Sema \[class\];clang::Sema::ActOnTag(\[snip\]) \[function\] 22373
...
llvm::AArch64InstPrinter \[class\];llvm::AArch64AppleInstPrinter \[class\];llvm::AArch64AppleInstPrinter::printAliasInstr(\[snip\]) \[function\] 105133
llvm::AArch64InstPrinter \[class\];llvm::AArch64AppleInstPrinter \[class\];llvm::AArch64AppleInstPrinter::printInstruction(\[snip\]) \[function\] 5824
...
llvm::Pass \[class\];llvm::FunctionPass \[class\];llvm::MachineFunctionPass \[class\];(anon)::X86SpeculativeLoadHardeningPass \[class\];(anonymous namespace)::X86SpeculativeLoadHardeningPass::checkAllLoads(llvm::MachineFunction&) \[function\] 19287
...
llvm::Pass \[class\];llvm::FunctionPass \[class\];llvm::MachineFunctionPass \[class\];(anon)::MachineLICMBase \[class\];(anonymous namespace)::MachineLICMBase::runOnMachineFunction(llvm::MachineFunction&) \[function\] 20343
Here's a link to a flamegraph of the class view diffstat (warning: it's big):
http://net.vedantk.com/static/llvm/swift-clang-4.2-vs-5.0.class-view.diffstats.svg
Finally, here are a few interesting entries from the inlining view diffstat. As
with all of the other views, the right hand side still shows code growth in
bytes. For a given inlining target, this size is computed by diffing the sum of
PC range lengths from all DW\_TAG\_inlined\_subroutines referring to that target.
This allows the size tool to attribute code size to an inlining target even
when the inlined code is not contiguous in the caller.
llvm::raw\_ostream::operator<<(char const\*) \[inlining-target\] 66720
llvm::MCRegisterClass::contains(unsigned int) const \[inlining-target\] 64161
llvm::StringRef::StringRef(char const\*) \[inlining-target\] 39262
llvm::MCInst::getOperand(unsigned int) const \[inlining-target\] 33268
clang::CodeCompletionResult::\~CodeCompletionResult() \[inlining-target\] 25763
llvm::operator+(llvm::Twine const&, llvm::Twine const&) \[inlining-target\] 25525
clang::ASTImporter::Import(clang::SourceLocation) \[inlining-target\] 21096
clang::Sema::Diag(clang::SourceLocation, unsigned int) \[inlining-target\] 20898
Feedback & questions welcome!
thanks,
vedant
LLVM Developers mailing list
llvm-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev