[llvm-dev] (RFC) Encoding code duplication factor in discriminator (original) (raw)
Dehao Chen via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 27 13:09:00 PDT 2016
- Previous message: [llvm-dev] (RFC) Encoding code duplication factor in discriminator
- Next message: [llvm-dev] (RFC) Encoding code duplication factor in discriminator
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
You are right, this problem also exists in GCC. And it was not solved there, we just left it open. As now we are tuning aggressively for LLVM, I think it's a good time to raise this issue and have a thorough solution so that we have a good chance of gaining more better performance than GCC afdo.
Dehao
On Thu, Oct 27, 2016 at 11:44 AM, David Blaikie <dblaikie at gmail.com> wrote:
Is there prior art for this sort of thing (in GCC, for example) - I take it this isn't the first time this has come up as a problem for profile accuracy? (so it'd be helpful to know prior solutions to this (& if we're not doing whatever was done before, what it is about our situation that's different, etc), or why it hasn't been a problem, etc)
On Thu, Oct 27, 2016 at 11:39 AM Dehao Chen <dehao at google.com> wrote:
Motivation: Many optimizations duplicate code. E.g. loop unroller duplicates the loop body, GVN duplicates computation, etc. The duplicated code will share the same debug info with the original code. For SamplePGO, the debug info is used to present the profile. Code duplication will affect profile accuracy. Taking loop unrolling for example:
#1 foo(); _#2 for (i = 0; i < N; i++) {_ _#3 bar();_ _#4 }_ _If N is 8 during runtime, a reasonable profile will look like:_ _#1: 10_ _#3: 80_ _But if the compiler unrolls the loop by a factor of 4, the callsite to_ _bar() is duplicated 4 times and the profile will look like:_ _#1: 10_ _#3: 20_ _The sample count for #3 is 20 because all 4 callsites to bar() are_ _sampled 20 times each, and they shared the same debug loc (#3) so that 20_ _will be attributed to #3 (If one debugloc is mapped by multiple_ _instructions, the max sample count of these instructions is used as_ _debugloc's sample count)._ _When loading this profile into compiler, it will think the loop trip_ _count is 2 instead of 8._ _Proposal:_ _When compiler duplicates code, it encodes the duplication info in the_ _debug info. As the duplication is not interesting to debugger, I propose to_ _encode this as part of the discriminator._ _There are 2 types of code duplication:_ _1. duplicated code are guaranteed to have the same execution count (e.g._ _loop unroll and loop vectorize). We can record the duplication factor, for_ _the above example "4" is recorded in the discriminator._ _2. duplicated code that may have different execution count (e.g. loop_ _peel and gvn). For a same debugloc, a unique number is assigned to each_ _copy and encoded in the discriminator._ _Assume that the discriminator is uint32. The traditional discriminator is_ _less than 256, let's take 8 bit for it. For duplication factor (type 1_ _duplication), we assume the maximum unrollfactor * vectorizefactor is_ _less than 256, thus 8 bit for it. For unique number(type 2 duplication), we_ _assume code is at most duplicated 32 times, thus 5 bit for it. Overall, we_ _still have 11 free bits left in the discriminator encoding._ _Let's take the original source as an example, after loop unrolling and_ _peeling, the code may looks like:_ _for (i = 0; i < N & 3; i+= 4) {_ _foo(); // discriminator: 0x40_ _foo(); // discriminator: 0x40_ _foo(); // discriminator: 0x40_ _foo(); // discriminator: 0x40_ _}_ _if (i++ < N) {_ _foo(); // discriminator: 0x100_ _if (i++ < N) {_ _foo(); // discriminator: 0x200_ _if (i++ < N) {_ _foo(); // discriminator: 0x300_ _}_ _}_ _}_ _The cost of this change would be increased debugline size because: 1. we_ _are recording more discriminators 2. discriminators become larger and will_ _take more ULEB128 encoding._ _The benefit is that the sample pgo profile can accurately represent the_ _code execution frequency. And we do not need to introduce new building_ _blocks to debug info._ _Comments?_ _Thanks,_ _Dehao_ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161027/70afb11a/attachment.html>
- Previous message: [llvm-dev] (RFC) Encoding code duplication factor in discriminator
- Next message: [llvm-dev] (RFC) Encoding code duplication factor in discriminator
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]