For release builds, I think this is fine. However for debug builds, the Windows allocator provides a lot of built-in functionality for debugging memory issues that I would be very sad to lose. Therefore, I would request that:

 

This be added as a configuration option to either select the new allocator or the windows allocatorThe Windows allocator be used by default in debug builds  

Ideally, since you’re doing this work, you’d implement it in such a way that it’s fairly easy for anybody to use whatever allocator they want when building LLVM (on any platform, not just windows), and it’s not just hardcoded to system  allocator vs whatever allocator ends up getting added. However, as long as I can use the windows debug allocator I’m happy.

 

Thanks,

   Christopher Tetreault

 

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Alexandre Ganea via cfe-dev
Sent: Wednesday, July 1, 2020 9:20 PM
To: cfe-dev@lists.llvm.org; LLVM Dev <llvm-dev@lists.llvm.org>
Subject: [EXT] [cfe-dev] RFC: Replacing the default CRT allocator on Windows

 

Hello,

 

I was wondering how folks were feeling about replacing the default Windows CRT allocator in Clang, LLD and other LLVM tools possibly.

 

The CRT heap allocator on Windows doesn’t scale well on large core count machines. Any multi-threaded workload in LLVM that allocates often is impacted by this. As a result, link times with ThinLTO are extremely slow  on Windows. We’re observing performance inversely proportional to the number of cores. The more cores the machines has, the slower ThinLTO linking gets.

 

We’ve replaced the CRT heap allocator by modern lock-free thread-cache allocators such as rpmalloc (unlicence), mimalloc (MIT licence) or snmalloc (MIT licence). The runtime performance is an order of magnitude faster.

 

Time to link clang.exe with LLD and -flto on 36-core:

  Windows CRT heap allocator: 38 min 47 sec

  mimalloc: 2 min 22 sec

  rpmalloc: 2 min 15 sec

  snmalloc: 2 min 19 sec

 

We’re running in production with a downstream fork of LLVM + rpmalloc for more than a year. However when cross-compiling some specific game platforms we’re using other downstream forks of LLVM that we can’t change.

 

Two questions arise:

The licencing. Should we embed one of these allocators into the LLVM tree, or keep them separate out-of-the-tree? If the answer for above question is “yes”, given the tremendous performance speedup, should we embed one of these allocators into Clang/LLD builds by default? (on  Windows only) Considering that Windows doesn’t have a LD_PRELOAD mechanism.  

Please see demo patch here: https://reviews.llvm.org/D71786

 

Thank you in advance for the feedback!

Alex.

 

">

(original) (raw)

> If I use clang with -fsanitize=address to build my program, and then run my program, what difference does it make for the execution of my program whether the compiler itself was instrumented or not

Yes, it doesn't make a difference to your final executable whether the compiler was built with ASan or not.

> Do you mean that ASAN runtime itself should be instrumented, since your program loads that at runtime?

Sanitizer runtimes aren't instrumented with sanitizers :).

-------

To be clear, we're talking about replacing the runtime allocator for clang/LLD/etc., right? We're not talking about replacing the default allocator for -O0 executables?

In either instance, using the ASan allocator (for either clang or executables) is possible, but won't provide any of the bug detection capabilities you describe without also ensuring that clang/your executable is built with ASan instrumentation (-fsanitize=address implies both "replace my allocator" and "instrument my code").

On Tue, Jul 7, 2020 at 2:53 PM Zachary Turner <zturner@roblox.com> wrote:
I hadn't heard this before. If I use clang with -fsanitize=address to build my program, and then run my program, what difference does it make for the execution of my program whether the compiler itself was instrumented or not? Do you mean that ASAN runtime itself should be instrumented, since your program loads that at runtime?

On Tue, Jul 7, 2020 at 2:04 PM Mitch Phillips <mitchp@google.com> wrote:
Bearing in mind that the ASan allocator isn't particularly suited to detecting memory corruption unless you compile LLVM/Clang with ASan instrumentation as well. I don't imagine anybody would be proposing making the debug build for Windows be ASan-ified by default.

On Tue, Jul 7, 2020 at 1:49 PM Adrian McCarthy via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Asan and the Debug CRT take different approaches, but the problems they cover largely overlap.

Both help with detection of errors like buffer overrun, double free, use after free, etc. Asan generally gives you more immediate feedback on those, but you pay a higher price in performance. Debug CRT lets you do some trade off between the performance hit and how soon it detects problems.

Asan documentation says leak detection is experimental on Windows, while the Debug CRT leak detection is mature and robust (and can be nearly automatic in debug builds). By adding a couple calls, you can do finer grained leak detection than checking what remains when the program exits.

Debug CRT lets you hook all of the malloc calls if you want, so you can extend it for your own types of tracking and bug detection. But I don't think that feature is often used.

Windows's Appverifier is cool and powerful. I cannot remember for sure, but I think some of its features might depend on the Debug CRT. One thing it can do is simulate allocation failures so you can test your program's recovery code, but most programs nowadays assume memory allocation never fails and will just crash if it ever does.

On Tue, Jul 7, 2020 at 10:25 AM Zachary Turner via llvm-dev <llvm-dev@lists.llvm.org> wrote:
Note that ASAN support is present on Windows now. Does the Debug CRT provide any features that are not better served by ASAN?

On Tue, Jul 7, 2020 at 9:44 AM Chris Tetreault via llvm-dev <llvm-dev@lists.llvm.org> wrote:

For release builds, I think this is fine. However for debug builds, the Windows allocator provides a lot of built-in functionality for debugging memory issues that I would be very sad to lose. Therefore, I would request that:

  1. This be added as a configuration option to either select the new allocator or the windows allocator
  2. The Windows allocator be used by default in debug builds

Ideally, since you’re doing this work, you’d implement it in such a way that it’s fairly easy for anybody to use whatever allocator they want when building LLVM (on any platform, not just windows), and it’s not just hardcoded to system allocator vs whatever allocator ends up getting added. However, as long as I can use the windows debug allocator I’m happy.

Thanks,

Christopher Tetreault

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Alexandre Ganea via cfe-dev
Sent: Wednesday, July 1, 2020 9:20 PM
To: cfe-dev@lists.llvm.org; LLVM Dev <llvm-dev@lists.llvm.org>
Subject: \[EXT\] \[cfe-dev\] RFC: Replacing the default CRT allocator on Windows

Hello,

I was wondering how folks were feeling about replacing the default Windows CRT allocator in Clang, LLD and other LLVM tools possibly.

The CRT heap allocator on Windows doesn’t scale well on large core count machines. Any multi-threaded workload in LLVM that allocates often is impacted by this. As a result, link times with ThinLTO are extremely slow on Windows. We’re observing performance inversely proportional to the number of cores. The more cores the machines has, the slower ThinLTO linking gets.

We’ve replaced the CRT heap allocator by modern lock-free thread-cache allocators such as rpmalloc (unlicence), mimalloc (MIT licence) or snmalloc (MIT licence). The runtime performance is an order of magnitude faster.

Time to link clang.exe with LLD and -flto on 36-core:

Windows CRT heap allocator: 38 min 47 sec

mimalloc: 2 min 22 sec

rpmalloc: 2 min 15 sec

snmalloc: 2 min 19 sec

We’re running in production with a downstream fork of LLVM + rpmalloc for more than a year. However when cross-compiling some specific game platforms we’re using other downstream forks of LLVM that we can’t change.

Two questions arise:

  1. The licencing. Should we embed one of these allocators into the LLVM tree, or keep them separate out-of-the-tree?
  2. If the answer for above question is “yes”, given the tremendous performance speedup, should we embed one of these allocators into Clang/LLD builds by default? (on Windows only) Considering that Windows doesn’t have a LD\_PRELOAD mechanism.

Please see demo patch here: https://reviews.llvm.org/D71786

Thank you in advance for the feedback!

Alex.

_______________________________________________

LLVM Developers mailing list

llvm-dev@lists.llvm.org

https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
LLVM Developers mailing list
llvm-dev@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev