Optimizations in notify-one by casualwind · Pull Request #12545 · facebook/rocksdb (original) (raw)
changed the title
Thread opt Optimizations in notify-one to improve the performance
casualwind changed the title
Optimizations in notify-one to improve the performance Optimizations in notify-one to
casualwind changed the title
Optimizations in notify-one to Optimizations in notify-one
We found that for writers s in STATE_LOCKED_WAITING, the notify-one function needs to be called, and the cost of calling this function is very high especially when there are many writers that need to be awakened. So, we Parallelize this progress.
To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.
vcpu=160, benchmark=db_bench The score is normalized:
| case name | optimized/base |
|---|---|
| fillrandom | 182% |
| fillseq | 184% |
| fillsync | 136% |
| overwrite | 179% |
| randomreplacekeys | 180% |
| randomtransaction | 161% |
| updaterandom | 163% |
| xorupdaterandom | 165% |
ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request
Summary: We tested on icelake server (vcpu=160). The default configuration is allow_concurrent_memtable_write=1, thread number =activate core number. With our optimizations, the improvement can reach up to 184% in fillseq case. op/s is as the performance indicator in db_bench, and the following are performance improvements in some cases in db_bench.
| case name | optimized/original |
|---|---|
| fillrandom | 182% |
| fillseq | 184% |
| fillsync | 136% |
| overwrite | 179% |
| randomreplacekeys | 180% |
| randomtransaction | 161% |
| updaterandom | 163% |
| xorupdaterandom | 165% |
With analysis, we find that although the process of writing memtable is processed in parallel, the process of waking up the writers is not processed in parallel, which means that only one writers is responsible for the sequential waking up other writers. The following is our method to optimize this process.
Assume that there are currently n threads in total, we parallelize SetState in LaunchParallelMemTableWriters. To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.
A reproduction script: ./db_bench --benchmarks="fillrandom" --threads ${number of all activate vcpu} --seed 1708494134896523 --duration 60
Pull Request resolved: facebook#12545
Reviewed By: ajkr
Differential Revision: D57422827
Pulled By: cbi42
fbshipit-source-id: 94127937c0c61e4241720bd902c82c607b7b2431
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})