gen0_bricks_cleared flag needs to be propagated when we change heap count by Maoni0 · Pull Request #90457 · dotnet/runtime (original) (raw)
Fixes #90452
@cshung discovered that we are hitting the debug break in check_gen0_bricks -
00 (Inline Function) --------`-------- KERNELBASE!DebugBreak [minkernel\kernelbase\debug.c @ 143]
01 00000015`38c7ef18 00007ff8`4080baa3 KERNELBASE!wil::details::DebugBreak+0x2 [onecore\internal\sdk\inc\wil\opensource\wil\result_macros.h @ 1737]
02 (Inline Function) --------`-------- coreclr!GCToOSInterface::DebugBreak+0x6 [C:\Dev\runtime\src\coreclr\gc\windows\gcenv.windows.cpp @ 652]
03 00000015`38c7ef20 00007ff8`4087cb33 coreclr!SVR::gc_heap::check_gen0_bricks+0x8f [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 12475]
04 00000015`38c7ef50 00007ff8`4087bd00 coreclr!SVR::gc_heap::garbage_collect+0x57 [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 23687]
05 00000015`38c7f290 00007ff8`4087ba1c coreclr!SVR::gc_heap::gc_thread_function+0x2e0 [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 7119]
06 00000015`38c7f6e0 00007ff8`408cc5c4 coreclr!SVR::gc_heap::gc_thread_stub+0x14c [C:\Dev\runtime\src\coreclr\gc\gc.cpp @ 36813]
07 (Inline Function) --------`-------- coreclr!`anonymous-namespace'::CreateNonSuspendableThread::__l2::<lambda_510aaaafbe1c92975f9e29788eb5579d>::operator()+0x65 [C:\Dev\runtime\src\coreclr\vm\gcenv.ee.cpp @ 1490]
08 00000015`38c7f8d0 00007ff8`f5f17614 coreclr!<lambda_510aaaafbe1c92975f9e29788eb5579d>::<lambda_invoker_cdecl>+0x74 [C:\Dev\runtime\src\coreclr\vm\gcenv.ee.cpp @ 1492]
09 00000015`38c7f900 00007ff8`f68826b1 KERNEL32!BaseThreadInitThunk+0x14 [clientcore\base\win32\client\thread.c @ 64]
0a 00000015`38c7f930 00000000`00000000 ntdll!RtlUserThreadStart+0x21 [minkernel\ntdll\rtlstrt.c @ 1153]
What happened was when we change the heap count, in heap X we get a region from heap Y and the gen0_bricks_cleared flag from Y says false but heap X says true. So when we check the bricks on heap X, we assume it’s true but it’s not. I can see this because the region check_gen0_bricks complains about is
0:013> .frame 3
03 00000070`6597f430 00007ffc`d8026aee clrgc!SVR::gc_heap::check_gen0_bricks+0x65 [D:\runtime-datas\src\coreclr\gc\gc.cpp @ 12632]
0:013> ?? gen0_region
class SVR::heap_segment * 0x000001b1`5340bed8
I added some dprintfs to print the gen0 regions on the old heaps with this flag as false and this is a region coming from heap#2 whose gen0_bricks_cleared flag was false -
[68756]trying to change heap count 24 -> 8
[68756]h2 g0 1b15340bed8 // print out gen0 regions from heaps whose gen0_bricks_cleared was false
[68756]g0: total budget: 62898032 budget per heap: 2620752
[68756]g1: total budget: 6283176 budget per heap: 261800
[68756]g2: total budget: 6283224 budget per heap: 261808
[68756]g3: total budget: 75497472 budget per heap: 3145728
[68756]g4: total budget: 75497472 budget per heap: 3145728
the fix is to detect if any heap has this flag as false and if so make all heaps’ flag false (tracking which region is moved from which other heap is something we need additional recording for and it’s not really worth doing just for this)