Always generate GEP i8 / ptradd for struct offsets by erikdesjardins · Pull Request #121665 · rust-lang/rust (original) (raw)

The regression is all in backtrace-related code:

_ZN3std12backtrace_rs9symbolize5gimli7Context3new                                                              +288 bytes (17760 -> 18048) +047 insts (3525 -> 3572), +0 funcs ( 1 ->  1)
_ZN9addr2line16ResUnit$LT$R$GT$25find_function_or_location28_$u7b$$u7b$closure$u7d$$u7d$                       +256 bytes (11792 -> 12048) +023 insts (2507 -> 2530), +0 funcs ( 1 ->  1)
_ZN3std12backtrace_rs9symbolize5gimli7resolve                                                                  +240 bytes (14800 -> 15040) +025 insts (2709 -> 2734), +0 funcs ( 1 ->  1)
_ZN4core6option19Option$LT$$RF$T$GT$6cloned                                                                    +176 bytes (    0 ->   176) +056 insts (   0 ->   56), +1 funcs ( 0 ->  1)
_ZN5gimli4read8rnglists20RngListIter$LT$R$GT$4next                                                             +160 bytes ( 3248 ->  3408) +043 insts ( 813 ->  856), +0 funcs ( 1 ->  1)
_ZN9addr2line11render_file                                                                                     +112 bytes (  848 ->   960) +034 insts ( 201 ->  235), +0 funcs ( 1 ->  1)
_ZN93_$LT$gimli..read..line..LineProgramHeader$LT$R$C$Offset$GT$$u20$as$u20$core..clone..Clone$GT$5clone       +096 bytes ( 1120 ->  1216) +033 insts ( 244 ->  277), +0 funcs ( 1 ->  1)
_ZN5gimli4read5dwarf13Unit$LT$R$GT$3new                                                                        +064 bytes (10224 -> 10288) +016 insts (2063 -> 2079), +0 funcs ( 1 ->  1)
_ZN5gimli4read4unit18Attribute$LT$R$GT$5value                                                                  +032 bytes ( 1328 ->  1360) +006 insts ( 382 ->  388), +0 funcs ( 1 ->  1)
_ZN9addr2line30LoopingLookup$LT$T$C$L$C$F$GT$10new_lookup                                                      +032 bytes ( 1536 ->  1568) -011 insts ( 352 ->  341), +0 funcs ( 1 ->  1)
_ZN9addr2line5Lines5parse                                                                                      +016 bytes ( 9920 ->  9936) +012 insts (2144 -> 2156), +0 funcs ( 1 ->  1)
_ZN5gimli4read5dwarf14Dwarf$LT$R$GT$11attr_string                                                              +016 bytes (  592 ->   608) +004 insts ( 165 ->  169), +0 funcs ( 1 ->  1)
_ZN5alloc7raw_vec11finish_grow                                                                                 +016 bytes (  576 ->   592) +003 insts ( 188 ->  191), +0 funcs ( 4 ->  4)
_ZN3std2io5Write18write_all_vectored                                                                           +016 bytes ( 1328 ->  1344) +003 insts ( 347 ->  350), +0 funcs ( 2 ->  2)
_ZN9addr2line16ResUnit$LT$R$GT$18dwarf_and_unit_dwo                                                            +016 bytes ( 1376 ->  1392) +002 insts ( 289 ->  291), +0 funcs ( 1 ->  1)
_ZN11miniz_oxide7inflate4core10decompress                                                                      +016 bytes ( 7888 ->  7904) +001 insts (1913 -> 1914), +0 funcs ( 1 ->  1)
_ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$7reserve21do_reserve_and_handle                                          -016 bytes (  944 ->   928) +008 insts ( 232 ->  240), +0 funcs ( 5 ->  5)
_ZN5gimli4read6abbrev13Abbreviations6insert                                                                    -016 bytes ( 4864 ->  4848) +006 insts ( 918 ->  924), +0 funcs ( 1 ->  1)
_ZN9addr2line8function17Function$LT$R$GT$14parse_children                                                      -016 bytes ( 5376 ->  5360) +003 insts (1088 -> 1091), +0 funcs ( 1 ->  1)
_ZN3std4path7PathBuf14_set_extension                                                                           -016 bytes (  624 ->   608) +000 insts ( 167 ->  167), +0 funcs ( 1 ->  1)
_ZN5alloc3ffi5c_str7CString19_from_vec_unchecked                                                               -016 bytes (  352 ->   336) +000 insts (  97 ->   97), +0 funcs ( 1 ->  1)
_ZN91_$LT$std..sys_common..backtrace.._print..DisplayBacktrace$u20$as$u20$core..fmt..Display$GT$3fmt           -016 bytes (  592 ->   576) -002 insts ( 123 ->  121), +0 funcs ( 1 ->  1)
_ZN69_$LT$std..sys..pal..unix..stdio..Stderr$u20$as$u20$std..io..Write$GT$14write_vectored                     -016 bytes (   96 ->    80) -004 insts (  29 ->   25), +0 funcs ( 1 ->  1)
_ZN5alloc7raw_vec19RawVec$LT$T$C$A$GT$16reserve_for_push                                                       -032 bytes ( 3296 ->  3264) +027 insts ( 834 ->  861), +0 funcs (17 -> 17)

(Option::cloned doesn't get inlined...in _ZN9addr2line11render_file.)

(Generated with this ad-hoc script. There's probably some existing way to do this though, isn't there...)

It's kind of interesting that the regression is spread across a bunch of different functions but all in backtrace-related code. Although I guess it's just because one function was optimized differently and the butterfly effect cascaded to everything else in its call graph.

Looking at gimli7Context3new, it's very hard to see anything meaningful in the asm diff, even with all constants and registers normalized, hundreds of instructions are moved around. It doesn't look obviously worse, just...perturbed. Same with find_function_or_location.

So I don't really see anything actionable here.

It does make me wonder if we should be compiling backtrace-related crates with opt-level=s though--there's a ton of enormous functions and just scanning through (not necessarily in the changed functions) I see some unrolled loops. Size is probably more of a priority than runtime performance for backtrace code.