Bayesian workflow for disease transmission modeling in Stan - PubMed (original) (raw)

. 2021 Nov 30;40(27):6209-6234.

doi: 10.1002/sim.9164. Epub 2021 Sep 8.

Affiliations

Bayesian workflow for disease transmission modeling in Stan

Léo Grinsztajn et al. Stat Med. 2021.

Abstract

This tutorial shows how to build, fit, and criticize disease transmission models in Stan, and should be useful to researchers interested in modeling the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic and other infectious diseases in a Bayesian framework. Bayesian modeling provides a principled way to quantify uncertainty and incorporate both data and prior knowledge into the model estimates. Stan is an expressive probabilistic programming language that abstracts the inference and allows users to focus on the modeling. As a result, Stan code is readable and easily extensible, which makes the modeler's work more transparent. Furthermore, Stan's main inference engine, Hamiltonian Monte Carlo sampling, is amiable to diagnostics, which means the user can verify whether the obtained inference is reliable. In this tutorial, we demonstrate how to formulate, fit, and diagnose a compartmental transmission model in Stan, first with a simple susceptible-infected-recovered model, then with a more elaborate transmission model used during the SARS-CoV-2 pandemic. We also cover advanced topics which can further help practitioners fit sophisticated models; notably, how to use simulations to probe the model and priors, and computational techniques to scale-up models based on ordinary differential equations.

Keywords: Bayesian workflow; compartmental models; epidemiology; infectious diseases.

© 2021 John Wiley & Sons Ltd.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interests.

Figures

FIGURE 1

FIGURE 1

Coding blocks in a

Stan

file. The operations in certain blocks are performed multiple times and in some cases differentiated; as a result, the computational cost of fitting the model is dominated by the transformed parameters and model blocks. Not shown is the

functions

block, which defines functions that can be called in any of the operative blocks

FIGURE 2

FIGURE 2

Model development as an iterative process

FIGURE 3

FIGURE 3

Number of students in bed each day during an influenza A (H1N1) outbreak at a British boarding school between January 22 and February 4, 1978

FIGURE 4

FIGURE 4

Diagram of the classic susceptible‐infectious‐recovered compartmental model

FIGURE 5

FIGURE 5

Prior predictive checks for (A) the recovery time (1/γ), (B) the basic reproduction number ℛ0 (β/γ), (C) the dispersion parameter (ϕ), (D) a set of 1000 epidemic trajectories (each line is a unique simulated trajectory), and (E) the range in the numbers of students in bed each day (the line is the median and the light teal area is the 95% central range). All these quantities are sampled from the prior distributions of the parameters. The dashed red lines correspond to weak bounds from our domain knowledge where available: The recovery time is expected to last between 0.5 and 30 days, ℛ0 cannot be lower than 1 or higher than 10. The plain horizontal red line shows the population size (763) [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 6

FIGURE 6

A, Trace plot showing the value of each chain at each iteration (excluding warm‐up) and B, marginal posterior densities for the transmission rate (

beta

or β), the recovery rate (

gamma

or γ) and the inverse dispersion parameter (

phi_inv

or 1/ϕ), separately for each of the four Markov chains [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 7

FIGURE 7

Marginal posterior densities for the transmission rate (

beta

or β), the recovery rate (

gamma

or γ) and the inverse dispersion parameter (

phi_inv

or 1/ϕ) obtained when fitting the model to simulated data. The red dashed lines show the fixed parameter values used for simulating the data [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 8

FIGURE 8

A, Posterior predictive check of the number of students in bed each day during an influenza A (H1N1) outbreak at a British boarding school. The line shows the median and the orange area the 90% prediction interval. B, Prior and posterior predictive checks of the basic reproduction number ℛ0 and of the recovery time (both truncated at 8). The dot shows the median posterior and the line shows the 95% credible interval [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 9

FIGURE 9

Daily number of reported cases of severe acute respiratory syndrome coronavirus 2 infection in Switzerland between February and June, 2020

FIGURE 10

FIGURE 10

A, Posterior distributions of the model parameters (the transmission rate β, the recovery rate γ and the inverse dispersion parameter 1/ϕ) and B, chain‐by‐chain posterior predictive check of the number of reported cases for the simple SIR model (model iteration #1) applied to data on the SARS‐CoV‐2 epidemic in Switzerland (white circles). SARS‐CoV‐2, severe acute respiratory syndrome coronavirus 2; SIR, susceptible‐infected‐recovered [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 11

FIGURE 11

Diagram of a SEIR model

FIGURE 12

FIGURE 12

A, Trace plot for two of the model parameters (the incubation rate a and the reporting rate pr—the other parameters are not shown) and B, posterior predictive check of the number of reported cases for the SEIR model with underreporting (model iteration #3) applied to data on the SARS‐CoV‐2 epidemic in Switzerland (white circles). SARS‐CoV‐2, severe acute respiratory syndrome coronavirus 2 [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 13

FIGURE 13

Pairs plot of all model parameters for the model incorporating control measures (model iteration #5) [Colour figure can be viewed at

wileyonlinelibrary.com

]

FIGURE 14

FIGURE 14

(A) Posterior predictive check of the number of reported cases and (B) of the cumulative incidence for the SEIR model including the effect of control measures and fitted to both reported cases and seroprevalence data (model iteration #7; white circles show data on reported cases in panel A and seroprevalence data in panel B). (C) Posterior distribution of the forcing function f that models the reduction in transmission after the introduction of lockdown measures. (D) Prior and posterior distributions of the parameters of model iteration #7 [Colour figure can be viewed at

wileyonlinelibrary.com

]

Similar articles

Cited by

References

    1. Flaxman S, Mishra S, Gandy A, et al. Estimating the effects of non‐pharmaceutical interventions on COVID‐19 in Europe. Nature. 2020;584:257‐261. - PubMed
    1. Salje H, Kiem CT, Lefrancq N, et al. Estimating the burden of SARS‐CoV‐2 in France. Science. 2020;369(6500):208‐211. 10.1126/science.abc3517 - DOI - PMC - PubMed
    1. Hauser A, Counotte MJ, Margossian CC, et al. Estimation of SARS‐CoV‐2 mortality during the early stages of an epidemic: a modeling study in Hubei, China, and six regions in Europe. PLoS Med. 2020;17(7):e1003189. - PMC - PubMed
    1. Keeling M, Danon L. Mathematical modelling of infectious diseases. Br Med Bull. 2009;92:33–42. - PubMed
    1. Carpenter B, Gelman A, Hoffman MD, et al. Stan: a probabilistic programming language. J Stat Softw. 2017;76:1–32. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources