Optimizations in notify-one by casualwind · Pull Request #12545 · facebook/rocksdb (original) (raw)

@casualwind changed the titleThread opt Optimizations in notify-one to improve the performance

Apr 16, 2024

@casualwind casualwind changed the titleOptimizations in notify-one to improve the performance Optimizations in notify-one to

Apr 16, 2024

@casualwind casualwind changed the titleOptimizations in notify-one to Optimizations in notify-one

Apr 16, 2024

cbi42

cbi42

cbi42

cbi42

cbi42

@casualwind

We found that for writers s in STATE_LOCKED_WAITING, the notify-one function needs to be called, and the cost of calling this function is very high especially when there are many writers that need to be awakened. So, we Parallelize this progress.

To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.

vcpu=160, benchmark=db_bench The score is normalized:

case name optimized/base
fillrandom 182%
fillseq 184%
fillsync 136%
overwrite 179%
randomreplacekeys 180%
randomtransaction 161%
updaterandom 163%
xorupdaterandom 165%

@cbi42

@cbi42

ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request

Apr 27, 2025

@casualwind

Summary: We tested on icelake server (vcpu=160). The default configuration is allow_concurrent_memtable_write=1, thread number =activate core number. With our optimizations, the improvement can reach up to 184% in fillseq case. op/s is as the performance indicator in db_bench, and the following are performance improvements in some cases in db_bench.

case name optimized/original
fillrandom 182%
fillseq 184%
fillsync 136%
overwrite 179%
randomreplacekeys 180%
randomtransaction 161%
updaterandom 163%
xorupdaterandom 165%

With analysis, we find that although the process of writing memtable is processed in parallel, the process of waking up the writers is not processed in parallel, which means that only one writers is responsible for the sequential waking up other writers. The following is our method to optimize this process.

Assume that there are currently n threads in total, we parallelize SetState in LaunchParallelMemTableWriters. To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.

A reproduction script: ./db_bench --benchmarks="fillrandom" --threads ${number of all activate vcpu} --seed 1708494134896523 --duration 60

image

Pull Request resolved: facebook#12545

Reviewed By: ajkr

Differential Revision: D57422827

Pulled By: cbi42

fbshipit-source-id: 94127937c0c61e4241720bd902c82c607b7b2431

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})