wip tcmalloc by 0xdeafbeef · Pull Request #147333 · rust-lang/rust (original) (raw)

hyperfine --prepare 'cargo clean' 'cargo +stock build -r' 'cargo +stage2 build -r' --runs 3 --warmup 0 Benchmark 1: cargo +stock build -r Time (mean ± σ): 269.238 s ± 0.906 s [User: 2343.174 s, System: 198.200 s] Range (min … max): 268.623 s … 270.278 s 3 runs

Benchmark 2: cargo +stage2 build -r Time (mean ± σ): 246.692 s ± 0.094 s [User: 2300.234 s, System: 165.301 s] Range (min … max): 246.617 s … 246.798 s 3 runs

Summary cargo +stage2 build -r ran 1.09 ± 0.00 times faster than cargo +stock build -r

Command being timed: "cargo +stage2 build -r" User time (seconds): 2255.06 System time (seconds): 167.08 Percent of CPU this job got: 1007% Elapsed (wall clock) time (h:mm:ss or m:ss): 4:00.39 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 5841344 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 272 Minor (reclaiming a frame) page faults: 45702476 Voluntary context switches: 177243 Involuntary context switches: 531296 Swaps: 0 File system inputs: 26184 File system outputs: 33975408 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

Command being timed: "cargo +stock build -r" User time (seconds): 2288.81 System time (seconds): 196.81 Percent of CPU this job got: 950% Elapsed (wall clock) time (h:mm:ss or m:ss): 4:21.50 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 9643212 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 268 Minor (reclaiming a frame) page faults: 58525292 Voluntary context switches: 207169 Involuntary context switches: 525091 Swaps: 0 File system inputs: 9272 File system outputs: 33879056 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0

I've compared with compiler built from master.

        discovered for pointer 0xe507ffb15c0: this pointer was recently freed with a size argument in the range [1, 8], but the associated span of allocated memory is for allocations with sizes [9, 16]

until i've patched operator delete to use regular delete. All errors came from llvm

     Thread 17 "lto cgu.00" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe17f56c0 (LWP 393688)]
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6,
    no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007fffe8881f63 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:89
#2  0x00007fffe8827f3e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007fffe880f6d0 in __GI_abort () at abort.c:77
#4  0x0000555555587930 in tcmalloc::tcmalloc_internal::ReportMismatchedSizeClass(tcmalloc::tcmalloc_int
#5  0x0000555555584702 in tcmalloc::tcmalloc_internal::central_freelist_internal::StaticForwarder::MapO
#6  0x0000555555558892 in tcmalloc::tcmalloc_internal::central_freelist_internal::CentralFreeList<tcmal
#7  0x00005555555b5321 in tcmalloc::tcmalloc_internal::ThreadCache::ReleaseToTransferCache(tcmalloc::tc
#8  0x00005555555b5611 in tcmalloc::tcmalloc_internal::ThreadCache::ListTooLong(tcmalloc::tcmalloc_inte
#9  0x00005555555b572f in tcmalloc::tcmalloc_internal::ThreadCache::DeallocateSlow(void*, tcmalloc::tcmalloc_internal::ThreadCache::FreeList*, unsigned long) ()
#10 0x0000555555560bc2 in tcmalloc::tcmalloc_internal::FreeWithHooksOrPerThread(void*, unsigned long)
    ()
#11 0x00005555555be97b in operator delete(void*, unsigned long) ()
#12 0x00007ffff30cfeab in std::_Rb_tree<unsigned long, std::pair<unsigned long const, llvm::GlobalValueSummaryInfo>, std::_Select1st<std::pair<unsigned long const, llvm::GlobalValueSummaryInfo> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, llvm::GlobalValueSummaryInfo> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, llvm::GlobalValueSummaryInfo> >*) [clone .isra.0] ()
   from /home/odm3n/dev/oss/rust/build/x86_64-unknown-linux-gnu/stage2/bin/../lib/librustc_driver-9c7769cfa025b531.so
#13 0x00007ffff30d30bd in LLVMRustThinLTOData::~LLVMRustThinLTOData() ()
   from /home/odm3n/dev/oss/rust/build/x86_64-unknown-linux-gnu/stage2/bin/../lib/librustc_driver-9c7769cfa025b531.so
#14 0x00007ffff30d3163 in LLVMRustFreeThinLTOData ()
   from /home/odm3n/dev/oss/rust/build/x86_64-unknown-linux-gnu/stage2/bin/../lib/librustc_driver-9c7769cfa025b531.so
#15 0x00007ffff30bebf6 in <alloc::sync::Arc<rustc_codegen_ssa:🔙:lto::ThinShared<rustc_codegen_llvm::LlvmCodegenBackend>>>::drop_slow ()
   from /home/odm3n/dev/oss/rust/build/x86_64-unknown-linux-gnu/stage2/bin/../lib/librustc_driver-9c7769cfa025b531.so
#16 0x00007ffff2f7a647 in <rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::write::WriteBackendMethods>::optimize_thin ()
   from /home/odm3n/dev/oss/rust/build/x86_64-unknown-linux-gnu/stage2/bin/../lib/librustc_driver-9c7769cfa025b531.so
#17 0x00007ffff307d20e in std::sys::backtrace::__rust_begin_short_backtrace::<<rustc_codegen_llvm::LlvmCodegenBackend as rustc_codegen_ssa::traits::backend::ExtraBackendMethods>::spawn_named_thread<rustc_codegen_ssa:🔙:write::spawn_work<rustc_codegen_llvm::LlvmCodegenBackend>::{closure#0}, ()>::{closure#0}, ()> ()
   from /home/odm3n/dev/oss/rust/build/x86_64-unknown-linux-gnu/stage2/bin/../lib/librustc_driver-9c7769cfa025b531.so

If it's interesting i can clean this all up, or mb we should run rust-perf test-suite before.