Optimize GC.AllocateUninitializedArray and use it in StringBuilder by adamsitnik · Pull Request #27364 · dotnet/coreclr (original) (raw)

I wanted to use GC.AllocateUninitializedArray in StringBuilder, but it was initially too slow. Calling it for small buffers was causing quite noticeable performance degradation.

I've tuned it up to ensure that it does not slow down the StringBuilder in "unlucky path" (small arrays) and does improve the perf in "lucky path" (big arrays). It should make this API more profitable to use in other places in the future.

Changes:

Micro benchmarks for the GC API:

[GenericTypeArguments(typeof(byte))] [GenericTypeArguments(typeof(char))] [GenericTypeArguments(typeof(object))] public class Perf_GC { private readonly Func<int, T[]> _allocateUninitializedArrayDelegate = CreateDelegate(typeof(GC), "AllocateUninitializedArray"); private readonly Func<int, T[]> _allocateArrayDelegate = CreateDelegate(typeof(Mimic), "AllocateArray");

[Params(256, 256 * 2, 256 * 3, 256 * 4, 256 * 6, 256 * 8)]
public int Length;

[Benchmark]
public T[] AllocateUninitializedArray() => _allocateUninitializedArrayDelegate(Length);

[Benchmark]
public T[] AllocateArray() => _allocateArrayDelegate(Length); // using delegate for apples to apples comparison

private static Func<N, T[]> CreateDelegate<N>(Type type, string methodName)
{
    // this method is not a part of .NET Standard so we need to use reflection
    var method = type
        .GetMethod(methodName, BindingFlags.NonPublic | BindingFlags.Static)
        .MakeGenericMethod(typeof(T));

    return method != null ? (Func<N, T[]>)method.CreateDelegate(typeof(Func<N, T[]>)) : null;
}

}

public static class Mimic { internal static T[] AllocateArray(int size) => new T[size]; }

I've simplified the default BDN output to make it easier to compare the results. In the table below the "Before" is the execution time for GC.AllocateUninitializedArray before my changes, in the "After" are with my changes. The new T[] contains the time for calling new operator (to have some base comparison)

Type Length Before After new T[]
Byte 256 78.63 ns 18.31 ns 18.17 ns
Char 256 79.33 ns 31.95 ns 31.66 ns
Object 256 113.34 ns 113.34 ns 113.03 ns
Byte 512 79.37 ns 31.38 ns 31.75 ns
Char 512 87.60 ns 58.12 ns 57.71 ns
Object 512 229.02 ns 229.30 ns 227.78 ns
Byte 768 83.24 ns 45.51 ns 45.85 ns
Char 768 95.92 ns 85.20 ns 84.34 ns
Object 768 353.66 ns 347.39 ns 349.48 ns
Byte 1024 85.99 ns 58.31 ns 57.58 ns
Char 1024 99.46 ns 100.62 ns 112.01 ns
Object 1024 457.07 ns 455.94 ns 457.47 ns
Byte 1536 92.40 ns 84.84 ns 84.44 ns
Char 1536 111.75 ns 112.97 ns 168.02 ns
Object 1536 653.64 ns 649.47 ns 643.37 ns
Byte 2048 100.61 ns 101.04 ns 111.81 ns
Char 2048 126.52 ns 125.31 ns 226.94 ns
Object 2048 830.92 ns 838.90 ns 836.48 ns