GitHub - quantumbrake/cayenne-benchmarks: Benchmarks for cayenne (original) (raw)

Goal

The goal of this repository is to compare the different softwares used for stochastic / Gillespie type simulations (see below) on two fronts

Accuracy of simulation
Speed of simulation

Background
Libraries compared
Quick results : what algorithm to use?
Methods
- Accuracy comparison
- Speed comparison
Results
- Example results from comparisons
How to run the code in this repository
- Accuracy tests
- Speed tests

Background

Stochastic simulations (see Wikipedia) are used to model biological processes or chemical reactions when the corresponding differential equations cannot be applied. This may be the case when the number of species being modeled is very small (such as 10s of molecules/biological species), and the randomness becomes important. A simple example is a starting with a small number of bacteria (say 5) in a dish. They may either all die out, or they may start dividing and growing rapidly. The outcome itself is random, and you would use a Gillespie simulator to model such a process.

Libraries compared

We compare the 4 libraries across Python, R and Julia. These are listed below along with the algorithms in each library.

Library	Language	Algorithm name in library	Algorithm name in our comparison	Standard name of algorithm
cayenne (v0.9.1)	Python (v3.6.9)	direct	direct	Gillespie's Direct method (Gillespie1973)
tau_leaping	tau_leaping	Standard tau leaping (Gillespie2001 also see Wikipedia)
tau_adaptive	tau_adaptive	Tau leaping with efficient step size selection (Cao. et al. 2006)
Tellurium (v2.1.5)	Python (v3.6.9)	gillespie	direct	Unknown (see here), likely similar to Gillespie's Direct Method
GillespieSSA (v0.6.1)	R (v3.6.1)	ssa.d	direct	Gillespie's Direct method (Gillespie1973)
ssa.etl	tau_leaping	Standard tau leaping (Gillespie2001 also see Wikipedia)
ssa.otl	tau_adaptive	Tau leaping with efficient step size selection (Cao. et al. 2006)
BioSimulator.jl (v0.9.3)	Julia (v1.4.0)	Direct	direct	Gillespie's Direct method (Gillespie1973)
TauLeapingDGLP2003	tau_leaping	Improved leap-size selection for accelerated stochastic simulation (Gillespie and Petzold 2003)
HybridSAL	tau_adaptive	Step anticipation tau-leaping (SAL) algorithm that defaults to Direct depending on cumulative density (Sehl et. al 2009)

Quick results : what algorithm to use?

direct	tau_leaping	tau_adaptive
cayenne	✔️ Most accurate yet	✔️ Very fast but may need manual tuning	Less accurate than GillespieSSA's version
Tellurium	❗ Inaccurate for 2nd order	N/A	N/A
GillespieSSA	Very slow	❗ Inaccurate for initial zero counts	❗ Inaccurate for initial zero counts
BioSimulator.jl	❗ Inaccurate interpolation	❗ Inaccurate for initial zero counts	❗ Inaccurate for initial zero counts

From this table above, a user is best off starting with cayenne's direct algorithm. It is accurate for several different model configurations.
If direct is too slow, cayenne's tau_leaping may be considered. This may require some hand-tuning of the tau parameter depending on the system. But we found that fixing the value to 0.1 sufficed for most of the accuracy tests.
Other algorithms and packages may be considered if the system under consideration does not begin with initial amounts set to zero or if there aren't higher order reactions.

Methods

Accuracy comparison

To compare the accuracies of the algorithms, we used a subset of the models recommended in the SBML test suite's stochastic component. They consist of 4 systems with different parameter combinations, resulting in a total of 14 models or 14 system-parameter combinations.

For the exact solvers (labeled direct in our table above), we used the Z and Y statistics mentioned in the SBML test suite.
For the approximate solvers (labeled tau_leaping and tau_adaptive in our table above), we checked the ratio of means and standard deviations.
We used 10000 repetitions for each model, for a given library and algorithm.

These decisions are in line with what is recommended in the SBML test suite.

Interpolating with `cayenne`'s backend

Stochastic simulation algorithms return the states of the model at random time points (e.g. at time points t=9.6t=9.6t=9.6 and t=10.2t=10.2t=10.2 seconds). However, the accuracy tests demand values at specific time points (e.g at t=10t=10t=10 seconds). To get the values at the time points needed for accuracy tests, we used cayenne's backend which implements this functionality.

While GillespieSSA and Tellurium do not interpolate in their backends to provide values at specific time points, BioSimulator does possess this functionality. To investigate the effect of different interpolation techniques, we compared BioSimulator's internal interpolation with the raw BioSimulator results interpolated in cayenne. We call the former BioSimulatorIntp (results are interpolated within BioSimulator.jl) in our discussion.

Speed comparison

To compare the speed of the different algorithms, we used a subset of models (5 of the 14) used for the accuracy comparisons. These were picked to represent the general breadth of models, with at least one coming from each of the 4 different systems.

A given model, for a given algorithm from a library, was run 10000 times and the time taken was noted.
This was repeated 7 times, to get an idea of the variance in the simulation time.
All simulations were run on single cores.
Speed was calculated as the inverse of the time taken in seconds for the algorithm to run to completion.

Results

Here we present some example results from our analysis, followed by key take-homes. The details are presented in the notebook available in the notebooks folder above or just viewed here.

Example results from comparisons

Accuracy vs. speed

The plot above shows accuracy on the X axis and speed on the Y axis, for one of the 5 models used in the speed comparison. Good algorithms belong on the top right corner of the plot.

It appears that the direct algorithms (circles) are usually more accurate than the approximate algorithms (crosses and squares).
The speed of the direct algorithms is not very different from the approximate algorithms. Interestingly, in some cases the direct algorithm is faster than its approximate counterparts.
Library-wise, we see the following trends:
- BioSimulator.jl's approximate algorithms are inaccurate.
- GillespieSSA is accurate but slow.
- The naive tau_leaping implemented in cayenne is both fast and accurate.

Yet this comparison is limited because it only explores a single model. A comparison of accuracy for different models, followed by speed for different models are presented below.

Accuracy

The direct algorithm was generally accurate across the packages tested. The exceptions are
- BioSimulator.jl appears to interpolate values poorly. When BioSimulator's raw results were interpolated with cayenne's backend, the accuracy was restored.
- Tellurium's appears to be inaccurate when reactions are second order (usually occurs when there are two reactants).
The approximate algorithms on average performed poorly compared to their direct counterparts across different models. Some systemic bugs/defects we identified are:
- BioSimulator.jl and GillespieSSA fail to simulate the system if the number of molecules is initially zero. They don't account for zero order reactions which can result in valid simulations, such as migration into the system.

Speed

cayenne, BioSimulator and Tellurium were similar in speed.
GillespieSSA was at least an order of magnitude slower than the rest of the packages. This was observed across different models and algorithms.

How to run the code in this repository

Accuracy tests

The accuracy tests can be run from the base directory of this repository (which contains run_simulations.py). Just run:

Usage: run_simulations.py [OPTIONS]

Run stochastic simulations for the library (lib), model IDs (models) and algorithms (algos).

Examples:

python run_simulations.py --lib cayenne --models 00001 --models 00003 --algos direct --algos tau_leaping --nrep 10000 --nprocs 4 --save

python run_simulations.py -l cayenne -m 00001 -m 00003 -a direct -a tau_leaping -n 10000 -p 4 --save

Options: -l, --lib TEXT The stochastic simulation library. Supported libraries: cayenne, BioSimulator, BioSimulatorIntp, GillespieSSA, Tellurium. -m, --models TEXT The DSMTS ID of the model to simulate. Specify multiple with additional -m tags (see example above). -a, --algos TEXT The stochastic algorithms to be used. Specify multiple with additional -a tags (see example above). Supported algorithms: direct, tau_leaping, tau_adaptive. -n, --nrep INTEGER The number of repetitions in the simulation -p, --nprocs INTEGER The number of CPU processes to use for accuracy test. --save / --no-save Save results of the simulation --help Show this message and exit.

For example, we can run a simulation as follows:

python run_simulations.py --lib cayenne --models 00001 --models 00003 --algos direct --algos tau_leaping --nrep 10000 --nprocs 4 --save

This will run the accuracy tests for the models "00001" and "00003" using the library cayenne's tau_leaping algorithm on 4 CPU cores and will save the time steps of the simulations.

Speed tests

The accuracy tests can be run from the base directory of this repository (which contains run_benchmarks.py). Just run:

Usage: run_benchmarks.py [OPTIONS]

Benchmark a stochastic simulation for a given library (lib), model ID (model) and algorithm (algo).

NOTE: You need hyperfine installed to run this script. It can be found here: https://github.com/sharkdp/hyperfine .

Examples:

python run_benchmarks --lib cayenne --model 00001 --algo direct --nrep 10000

python run_benchmarks -l cayenne -m 00001 -a direct -n 10000

Options: -l, --lib TEXT The stochastic simulation library. Supported libraries: cayenne, BioSimulator, BioSimulatorIntp, GillespieSSA, Tellurium. -m, --model TEXT The DSMTS ID of the model to benchmark -a, --algo TEXT The stochastic algorithm to benchmark. Supported algorithms: direct, tau_leaping, tau_adaptive. -n, --nrep INTEGER The number of repetitions in the stochastic simulation (typically ~10000) -t, --timeout INTEGER Seconds to wait until timeout --help Show this message and exit.

For example, we can invoke a speed benchmark as follows:

python run_benchmarks --lib cayenne --model 00001 --algo direct --nrep 10000

This will run the speed benchmarks for the cayenne library, 00001 model and direct algorithm for 10,000 repetitions. This will be run 7 times to get summary statistics.

GitHub - quantumbrake/cayenne-benchmarks: Benchmarks for cayenne (original) (raw)

Goal

Table of contents

Background

Libraries compared

Quick results : what algorithm to use?

Methods

Accuracy comparison

Interpolating with cayenne's backend

Speed comparison

Results

Example results from comparisons

Accuracy vs. speed

Accuracy

Speed

How to run the code in this repository

Accuracy tests

Speed tests

Interpolating with `cayenne`'s backend