Optimizations in notify-one by casualwind · Pull Request #12545 · facebook/rocksdb (original) (raw)

changed the title~~Thread opt~~ Optimizations in notify-one to improve the performance

Apr 16, 2024

casualwind changed the title~~Optimizations in notify-one to improve the performance~~ Optimizations in notify-one to

Apr 16, 2024

casualwind changed the title~~Optimizations in notify-one to~~ Optimizations in notify-one

Apr 16, 2024

We found that for writers s in STATE_LOCKED_WAITING, the notify-one function needs to be called, and the cost of calling this function is very high especially when there are many writers that need to be awakened. So, we Parallelize this progress.

To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.

vcpu=160, benchmark=db_bench The score is normalized:

case name	optimized/base
fillrandom	182%
fillseq	184%
fillsync	136%
overwrite	179%
randomreplacekeys	180%
randomtransaction	161%
updaterandom	163%
xorupdaterandom	165%

ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request

Apr 27, 2025

Summary: We tested on icelake server (vcpu=160). The default configuration is allow_concurrent_memtable_write=1, thread number =activate core number. With our optimizations, the improvement can reach up to 184% in fillseq case. op/s is as the performance indicator in db_bench, and the following are performance improvements in some cases in db_bench.

case name	optimized/original
fillrandom	182%
fillseq	184%
fillsync	136%
overwrite	179%
randomreplacekeys	180%
randomtransaction	161%
updaterandom	163%
xorupdaterandom	165%

With analysis, we find that although the process of writing memtable is processed in parallel, the process of waking up the writers is not processed in parallel, which means that only one writers is responsible for the sequential waking up other writers. The following is our method to optimize this process.

Assume that there are currently n threads in total, we parallelize SetState in LaunchParallelMemTableWriters. To wake up each writer to write its own memtable, the leader writer first wakes up the (n^0.5-1) caller writers, and then those callers and the leader will wake up n/x separately to write to the memtable. This reduces the number for the leader's to SetState n-1 writers to 2*(n^0.5) writers in turn.

A reproduction script: ./db_bench --benchmarks="fillrandom" --threads ${number of all activate vcpu} --seed 1708494134896523 --duration 60

Pull Request resolved: facebook#12545

Reviewed By: ajkr

Differential Revision: D57422827

Pulled By: cbi42

fbshipit-source-id: 94127937c0c61e4241720bd902c82c607b7b2431

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})