Improve HTTP/2 scaling performance by stephentoub · Pull Request #35694 · dotnet/runtime (original) (raw)

Contributes to #35184
cc: @scalablecory, @wfurt, @JamesNK

I used James' repro from the linked issue, but focused on the "r" mode, which is the one that uses HttpClient directly rather than going through higher layers. I also made some tweaks to the code, in JamesNK/Http2Perf#1, to remove some of the overhead from that layer.

Net net, at 100 concurrent requests in this repro this PR improves RPS by ~2x.

These numbers were gathered on top of #35575.


Before, "r 1 false", Kestrel, client GC

Successfully processed 13424; RPS 7723; Errors 0; Last elapsed 0.1111ms
Successfully processed 21233; RPS 7807; Errors 0; Last elapsed 0.1194ms
Successfully processed 29050; RPS 7815; Errors 0; Last elapsed 0.1277ms
Successfully processed 36905; RPS 7853; Errors 0; Last elapsed 0.1207ms
Successfully processed 44841; RPS 7935; Errors 0; Last elapsed 0.1261ms

After, "r 1 false", Kestrel, client GC

Successfully processed 13891; RPS 8071; Errors 0; Last elapsed 0.1182ms
Successfully processed 21925; RPS 8032; Errors 0; Last elapsed 0.122ms
Successfully processed 29912; RPS 7985; Errors 0; Last elapsed 0.1411ms
Successfully processed 37975; RPS 8062; Errors 0; Last elapsed 0.1402ms
Successfully processed 46108; RPS 8131; Errors 0; Last elapsed 0.1257ms


Before, "r 100 false", Kestrel, client GC

Successfully processed 79559; RPS 47748; Errors 0; Last elapsed 2.2332ms
Successfully processed 127500; RPS 47927; Errors 0; Last elapsed 2.0807ms
Successfully processed 175485; RPS 47975; Errors 0; Last elapsed 1.9676ms
Successfully processed 223111; RPS 47607; Errors 0; Last elapsed 2.0324ms
Successfully processed 271552; RPS 48428; Errors 0; Last elapsed 1.7378ms

After, "r 100 false", Kestrel, client GC

Successfully processed 157536; RPS 93753; Errors 0; Last elapsed 1.0717ms
Successfully processed 249660; RPS 92102; Errors 0; Last elapsed 0.8804ms
Successfully processed 339840; RPS 90051; Errors 0; Last elapsed 0.5867ms
Successfully processed 432333; RPS 92486; Errors 0; Last elapsed 0.3346ms
Successfully processed 525563; RPS 93227; Errors 0; Last elapsed 1.3646ms


Before, "r 100 false", Kestrel, server GC

Successfully processed 110621; RPS 42381; Errors 0; Last elapsed 2.6166ms
Successfully processed 152748; RPS 42110; Errors 0; Last elapsed 2.2373ms
Successfully processed 195372; RPS 42606; Errors 0; Last elapsed 2.2102ms
Successfully processed 237832; RPS 42451; Errors 0; Last elapsed 2.1775ms
Successfully processed 280273; RPS 42424; Errors 0; Last elapsed 2.2575ms

After, "r 100 false", Kestrel, server GC

Successfully processed 232393; RPS 101467; Errors 0; Last elapsed 1.4772ms
Successfully processed 333288; RPS 100866; Errors 0; Last elapsed 0.4939ms
Successfully processed 434999; RPS 101684; Errors 0; Last elapsed 0.325ms
Successfully processed 537804; RPS 102797; Errors 0; Last elapsed 0.4198ms
Successfully processed 638216; RPS 100403; Errors 0; Last elapsed 1.3467ms


There's still a gap with the C gRPC client, but it's much smaller now. On this benchmark HttpClient is ~18% faster with one concurrent request:

After "r 1 false", Kestrel, server GC

Successfully processed 54093; RPS 8102; Errors 0; Last elapsed 0.116ms
Successfully processed 62136; RPS 8041; Errors 0; Last elapsed 0.1253ms
Successfully processed 70204; RPS 8066; Errors 0; Last elapsed 0.124ms
Successfully processed 78212; RPS 8007; Errors 0; Last elapsed 0.1398ms
Successfully processed 86269; RPS 8055; Errors 0; Last elapsed 0.1541ms

"c 1 false", Kestrel

Successfully processed 12240; RPS 6480; Errors 0; Last elapsed 0.1659ms
Successfully processed 18691; RPS 6449; Errors 0; Last elapsed 0.149ms
Successfully processed 25202; RPS 6510; Errors 0; Last elapsed 0.1602ms
Successfully processed 31615; RPS 6411; Errors 0; Last elapsed 0.148ms
Successfully processed 38051; RPS 6435; Errors 0; Last elapsed 0.1745ms

whereas it's ~18% slower with 100 concurrent requests:

After "r 100 false", Kestrel, server GC

Successfully processed 333318; RPS 103495; Errors 0; Last elapsed 0.4614ms
Successfully processed 438239; RPS 104921; Errors 0; Last elapsed 1.3381ms
Successfully processed 541034; RPS 102781; Errors 0; Last elapsed 0.7129ms
Successfully processed 645850; RPS 104676; Errors 0; Last elapsed 0.5231ms
Successfully processed 749534; RPS 103589; Errors 0; Last elapsed 0.3373ms

"c 100 false", Kestrel

Successfully processed 234346; RPS 126174; Errors 0; Last elapsed 0.8313ms
Successfully processed 361244; RPS 126825; Errors 0; Last elapsed 0.6312ms
Successfully processed 487486; RPS 126207; Errors 0; Last elapsed 0.993ms
Successfully processed 612871; RPS 125318; Errors 0; Last elapsed 0.6118ms
Successfully processed 740618; RPS 127681; Errors 0; Last elapsed 0.5951ms


I expect there's still room for improvements; I only fixed the most egregious issues showing up in traces, and a few smaller things I noticed along the way.

The changes are broken out in individual commits. Quick summaries: