GitHub - quantumbrake/cayenne-benchmarks: Benchmarks for cayenne (original) (raw)

Goal

The goal of this repository is to compare the different softwares used for stochastic / Gillespie type simulations (see below) on two fronts

Table of contents

Background

Stochastic simulations (see Wikipedia) are used to model biological processes or chemical reactions when the corresponding differential equations cannot be applied. This may be the case when the number of species being modeled is very small (such as 10s of molecules/biological species), and the randomness becomes important. A simple example is a starting with a small number of bacteria (say 5) in a dish. They may either all die out, or they may start dividing and growing rapidly. The outcome itself is random, and you would use a Gillespie simulator to model such a process.

Libraries compared

We compare the 4 libraries across Python, R and Julia. These are listed below along with the algorithms in each library.

Library Language Algorithm name in library Algorithm name in our comparison Standard name of algorithm
cayenne (v0.9.1) Python (v3.6.9) direct direct Gillespie's Direct method (Gillespie1973)
tau_leaping tau_leaping Standard tau leaping (Gillespie2001 also see Wikipedia)
tau_adaptive tau_adaptive Tau leaping with efficient step size selection (Cao. et al. 2006)
Tellurium (v2.1.5) Python (v3.6.9) gillespie direct Unknown (see here), likely similar to Gillespie's Direct Method
GillespieSSA (v0.6.1) R (v3.6.1) ssa.d direct Gillespie's Direct method (Gillespie1973)
ssa.etl tau_leaping Standard tau leaping (Gillespie2001 also see Wikipedia)
ssa.otl tau_adaptive Tau leaping with efficient step size selection (Cao. et al. 2006)
BioSimulator.jl (v0.9.3) Julia (v1.4.0) Direct direct Gillespie's Direct method (Gillespie1973)
TauLeapingDGLP2003 tau_leaping Improved leap-size selection for accelerated stochastic simulation (Gillespie and Petzold 2003)
HybridSAL tau_adaptive Step anticipation tau-leaping (SAL) algorithm that defaults to Direct depending on cumulative density (Sehl et. al 2009)

Quick results : what algorithm to use?

direct tau_leaping tau_adaptive
cayenne ✔️ Most accurate yet ✔️ Very fast but may need manual tuning Less accurate than GillespieSSA's version
Tellurium ❗ Inaccurate for 2nd order N/A N/A
GillespieSSA Very slow ❗ Inaccurate for initial zero counts ❗ Inaccurate for initial zero counts
BioSimulator.jl ❗ Inaccurate interpolation ❗ Inaccurate for initial zero counts ❗ Inaccurate for initial zero counts

Methods

Accuracy comparison

To compare the accuracies of the algorithms, we used a subset of the models recommended in the SBML test suite's stochastic component. They consist of 4 systems with different parameter combinations, resulting in a total of 14 models or 14 system-parameter combinations.

These decisions are in line with what is recommended in the SBML test suite.

Interpolating with cayenne's backend

Stochastic simulation algorithms return the states of the model at random time points (e.g. at time points t=9.6t=9.6t=9.6 and t=10.2t=10.2t=10.2 seconds). However, the accuracy tests demand values at specific time points (e.g at t=10t=10t=10 seconds). To get the values at the time points needed for accuracy tests, we used cayenne's backend which implements this functionality.

While GillespieSSA and Tellurium do not interpolate in their backends to provide values at specific time points, BioSimulator does possess this functionality. To investigate the effect of different interpolation techniques, we compared BioSimulator's internal interpolation with the raw BioSimulator results interpolated in cayenne. We call the former BioSimulatorIntp (results are interpolated within BioSimulator.jl) in our discussion.

Speed comparison

To compare the speed of the different algorithms, we used a subset of models (5 of the 14) used for the accuracy comparisons. These were picked to represent the general breadth of models, with at least one coming from each of the 4 different systems.

Results

Here we present some example results from our analysis, followed by key take-homes. The details are presented in the notebook available in the notebooks folder above or just viewed here.

Example results from comparisons

image info

Accuracy vs. speed

The plot above shows accuracy on the X axis and speed on the Y axis, for one of the 5 models used in the speed comparison. Good algorithms belong on the top right corner of the plot.

Yet this comparison is limited because it only explores a single model. A comparison of accuracy for different models, followed by speed for different models are presented below.

Accuracy

Speed

How to run the code in this repository

Accuracy tests

The accuracy tests can be run from the base directory of this repository (which contains run_simulations.py). Just run:

Usage: run_simulations.py [OPTIONS]

Run stochastic simulations for the library (lib), model IDs (models) and algorithms (algos).

Examples:

python run_simulations.py --lib cayenne --models 00001 --models 00003 --algos direct --algos tau_leaping --nrep 10000 --nprocs 4 --save

python run_simulations.py -l cayenne -m 00001 -m 00003 -a direct -a tau_leaping -n 10000 -p 4 --save

Options: -l, --lib TEXT The stochastic simulation library. Supported libraries: cayenne, BioSimulator, BioSimulatorIntp, GillespieSSA, Tellurium. -m, --models TEXT The DSMTS ID of the model to simulate. Specify multiple with additional -m tags (see example above). -a, --algos TEXT The stochastic algorithms to be used. Specify multiple with additional -a tags (see example above). Supported algorithms: direct, tau_leaping, tau_adaptive. -n, --nrep INTEGER The number of repetitions in the simulation -p, --nprocs INTEGER The number of CPU processes to use for accuracy test. --save / --no-save Save results of the simulation --help Show this message and exit.

For example, we can run a simulation as follows:

python run_simulations.py --lib cayenne --models 00001 --models 00003 --algos direct --algos tau_leaping --nrep 10000 --nprocs 4 --save

This will run the accuracy tests for the models "00001" and "00003" using the library cayenne's tau_leaping algorithm on 4 CPU cores and will save the time steps of the simulations.

Speed tests

The accuracy tests can be run from the base directory of this repository (which contains run_benchmarks.py). Just run:

Usage: run_benchmarks.py [OPTIONS]

Benchmark a stochastic simulation for a given library (lib), model ID (model) and algorithm (algo).

NOTE: You need hyperfine installed to run this script. It can be found here: https://github.com/sharkdp/hyperfine .

Examples:

python run_benchmarks --lib cayenne --model 00001 --algo direct --nrep 10000

python run_benchmarks -l cayenne -m 00001 -a direct -n 10000

Options: -l, --lib TEXT The stochastic simulation library. Supported libraries: cayenne, BioSimulator, BioSimulatorIntp, GillespieSSA, Tellurium. -m, --model TEXT The DSMTS ID of the model to benchmark -a, --algo TEXT The stochastic algorithm to benchmark. Supported algorithms: direct, tau_leaping, tau_adaptive. -n, --nrep INTEGER The number of repetitions in the stochastic simulation (typically ~10000) -t, --timeout INTEGER Seconds to wait until timeout --help Show this message and exit.

For example, we can invoke a speed benchmark as follows:

python run_benchmarks --lib cayenne --model 00001 --algo direct --nrep 10000

This will run the speed benchmarks for the cayenne library, 00001 model and direct algorithm for 10,000 repetitions. This will be run 7 times to get summary statistics.