(original) (raw)

The size increase of chrome on Linux by switching from all ICF to safe ICF is small.
All ICF:
text data bss dec hex filename
169314343 8472660 2368965 180155968 abcf640 chrome
Safe ICF:
text data bss dec hex filename
174521550 8497604 2368965 185388119 b0ccc57 chrome

On Windows, chrome.dll increases size by around 14 MB (12MB increases in .text section).
All ICF:
Size of out\\Default\\chrome.dll is 170.715648 MB
name: mem size , disk size
.text: 141.701417 MB
.rdata: 22.458476 MB
.data: 3.093948 MB, 0.523264 MB
.pdata: 4.412364 MB
.00cfg: 0.000040 MB
.gehcont: 0.000132 MB
.retplne: 0.000108 MB
.rodata: 0.004544 MB
.tls: 0.000561 MB
CPADinfo: 0.000056 MB
\_RDATA: 0.000244 MB
.rsrc: 0.285232 MB
.reloc: 1.324196 MB
Safe ICF:
Size of out\\icf-safe\\chrome.dll is 184.499712 MB
name: mem size , disk size
.text: 153.809529 MB
.rdata: 23.123628 MB
.data: 3.093948 MB, 0.523264 MB
.pdata: 5.367396 MB
.00cfg: 0.000040 MB
.gehcont: 0.000132 MB
.retplne: 0.000108 MB
.rodata: 0.004544 MB
.tls: 0.000561 MB
CPADinfo: 0.000056 MB
\_RDATA: 0.000244 MB
.rsrc: 0.285232 MB
.reloc: 1.379364 MB

If an attribute is used and it affects unnamed\_addr of a symbol, it determines whether the symbols should show up in the .addrsig table. All-ICF mode in ld.lld and lld-link ignore symbols in the .addrsig table, if they belong to code sections. So, it won't have an effect on disabling ICF.

On Mon, Mar 22, 2021 at 10:19 PM Fangrui Song <maskray@google.com> wrote:

On 2021-03-22, David Blaikie via llvm-dev wrote:
\>ICF: Identical Code Folding
\>
\>Linker deduplicates functions by collapsing any identical functions
\>together - with icf=safe, the linker looks at a .addressing section in the
\>object file and any functions listed in that section are not treated as
\>collapsible (eg: because they need to meet C++'s "distinct functions have
\>distinct addresses" guarantee)

The name originated from MSVC link.exe where icf stands for "identical COMDAT folding".
gold named it "identical code folding" - which makes some sense because gold does not fold readonly data.

In LLD, the name is not accurate for two reasons: (1) the feature can
apply to readonly data as well; (2) the folding is by section, not by function.

We define identical sections as they have identical content and their
outgoing relocation sets cannot be distinguished: they need to have the
same number of relocations, with the same relative locations, with the
referenced symbols indistinguishable.

Then, ld.lld --icf={safe,all} works like this:

For a set of identical sections, the linker picks one representative and
drops the rest, then redirects references to the representative.

Note: this can confuse debuggers/symbolizers/profilers easily.

lld-link /opt:icf is different from ld.lld --icf but I haven't looked
into it closely.


I find that the feature's saving is small given its downside
(also increaded link time: the current LLD's implementation is inferior:
it performs a quadratic number of comparisons among an equality class):

This is the size differences for the 'lld' executable:

% size lld.{none,safe,all}
text data bss dec hex filename
96821040 7210504 550810 104582354 63bccd2 lld.none
95217624 7167656 550810 102936090 622ae1a lld.safe
94038808 7167144 550810 101756762 610af5a lld.all
% size gold.{none,safe,all}
text data bss dec hex filename
96857302 7174792 550825 104582919 63bcf07 gold.none
94469390 7174792 550825 102195007 6175f3f gold.safe
94184430 7174792 550825 101910047 613061f gold.all

Note that the --icf=all result caps the potential saving of the proposed annotation.

Actually with some large internal targets I get even smaller savings.


ld.lld --icf=safe is safer than gold --icf=safe but probably misses some opportunities.
It can be that clang codegen/optimizer fail to mark some cases as {,local\_}unnamed\_addr.

I know Chromium and the Windows world can be different:) But I'd still want to
get some numbers first.


Last, I have seen that Chromium has some code like
https://source.chromium.org/chromium/chromium/src/+/master:skia/ext/SkMemory\_new\_handler.cpp

void sk\_abort\_no\_print() {
// Linker's ICF feature may merge this function with other functions with
// the same definition (e.g. any function whose sole job is to call abort())
// and it may confuse the crash report processing system.
// http://crbug.com/860850
static int static\_variable\_to\_make\_this\_function\_unique = 0x736b; // "sk"
base::debug::Alias(&static\_variable\_to\_make\_this\_function\_unique);

abort();
}

If we want an approach to work with link.exe, I don't know what we can do...
If no desire for link.exe compatibility, I can see that having a proper way marking the function
can be useful... but in any case if an attribute is used, it probably should affect
unnamed\_addr directly instead of being called \*icf\*.



\>On Mon, Mar 22, 2021 at 6:16 PM Philip Reames via llvm-dev <
\>llvm-dev@lists.llvm.org> wrote:
\>
\>> Can you define ICF please? And give a bit of context?
\>>
\>> Philip
\>> On 3/22/21 5:27 PM, Zequan Wu via llvm-dev wrote:
\>>
\>> Hi all,
\>>
\>> Background:
\>> It's been a longstanding difficulty of debugging with ICF. Programmers
\>> don't have control over which sections should be folded by ICF, which
\>> sections shouldn't. The existing address significant table won't have
\>> effect for code sections during all ICF mode in both ld.lld and lld-link.
\>> By switching to safe ICF could mark code sections as unique, but at a cost
\>> of increasing binary size out of control. So, it would be good if
\>> programmers could selectively disable ICF in source code by annotating
\>> global functions/variables with an attribute to improve debugging
\>> experience and have the control on the binary size increase.
\>>
\>> My plan is to add a new section table(\`.no\_icf\`) to object files. Sections
\>> of all symbols inside the table should not be folded by all ICF mode. And
\>> symbols can only be added into the table by annotating global
\>> functions/variables with a new attribute(\`no\_icf\`) in source code.
\>>
\>> What do you think about this approach?
\>>
\>> Thanks,
\>> Zequan
\>>
\>>
\>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>> LLVM Developers mailing listllvm-dev@lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\>>
\>> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>> LLVM Developers mailing list
\>> llvm-dev@lists.llvm.org
\>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\>>

\>\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\>LLVM Developers mailing list
\>llvm-dev@lists.llvm.org
\>https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev