ABCtoolbox: a versatile toolkit for approximate Bayesian computations - PubMed (original) (raw)

ABCtoolbox: a versatile toolkit for approximate Bayesian computations

Daniel Wegmann et al. BMC Bioinformatics. 2010.

Abstract

Background: The estimation of demographic parameters from genetic data often requires the computation of likelihoods. However, the likelihood function is computationally intractable for many realistic evolutionary models, and the use of Bayesian inference has therefore been limited to very simple models. The situation changed recently with the advent of Approximate Bayesian Computation (ABC) algorithms allowing one to obtain parameter posterior distributions based on simulations not requiring likelihood computations.

Results: Here we present ABCtoolbox, a series of open source programs to perform Approximate Bayesian Computations (ABC). It implements various ABC algorithms including rejection sampling, MCMC without likelihood, a Particle-based sampler and ABC-GLM. ABCtoolbox is bundled with, but not limited to, a program that allows parameter inference in a population genetics context and the simultaneous use of different types of markers with different ploidy levels. In addition, ABCtoolbox can also interact with most simulation and summary statistics computation programs. The usability of the ABCtoolbox is demonstrated by inferring the evolutionary history of two evolutionary lineages of Microtus arvalis. Using nuclear microsatellites and mitochondrial sequence data in the same estimation procedure enabled us to infer sex-specific population sizes and migration rates and to find that males show smaller population sizes but much higher levels of migration than females.

Conclusion: ABCtoolbox allows a user to perform all the necessary steps of a full ABC analysis, from parameter sampling from prior distributions, data simulations, computation of summary statistics, estimation of posterior distributions, model choice, validation of the estimation procedure, and visualization of the results.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Flowchart describing the individual steps of an ABC estimation by ABCtoolbox. Black arrows indicate the standard approach. Some alternative paths are shown with dotted lines. For instance, it is possible to modify the output of a simulation program such as to allow one to take specific characteristics of the observed data into account such as a given level of missing data. Additionally, ABCtoolbox can call several simulation programs per iteration, each of which can be launched with the same parameter values. Thus, different data types can be conveniently combined in a single analysis.

Figure 2

Figure 2

Evolutionary model of the demographic history of two groups of populations corresponding to the Central and Eastern mtDNA evolutionary lineages of the common vole Microtus arvalis. An ancestral population of size N A diverges into the two population groups T DIV generations ago. We assumed a continent-island model for each of the two population groups. Islands (numbered subscripts) represent populations from which genetic data is available and the continents (subscripts "Eastern" and "Central") represent collectively all unsampled populations from a given evolutionary lineage. We further assumed the population sizes of the continents (_N_Central and _N_Eastern) to be large (107 individuals) and the population sizes of the islands (N1, N2, N3, etc.) to follow a Normal distribution with mean N and standard deviation σ N. Note that only four out of the 11 islands are shown. Backward in time, migration was only allowed from islands to the continent at rates Nm. While the same demographic model was assumed for both marker types, population sizes and migration rates were scaled differently (see text).

Figure 3

Figure 3

Distributions of the quantiles (_x_-axis) of the known parameter values as inferred from the posterior distributions obtained with ABCestimator for 1000 pseudo-observed data sets. These distributions are expected to be uniform if posterior densities have appropriate coverage properties [9]. We show these distributions for all model parameters (see text). The reported p-values above each histogram are the result of a Kolmogorov-Smirnov test for departure from distribution uniformity.

Figure 4

Figure 4

Posterior distributions obtained with ABCestimator based on a likelihood-free MCMC chain of 106 steps performed by ABCsampler. Additional characteristics of the posterior distributions, along with the prior distributions, are given in Table 1. See Figure 2 and text for parameter description.

Similar articles

Cited by

References

    1. Beaumont MA, Rannala B. The Bayesian revolution in genetics. Nat Rev Genet. 2004;5(4):251–261. doi: 10.1038/nrg1318. - DOI - PubMed
    1. Tavaré S, Balding DJ, Griffiths RC, Donnelly P. Inferring coalescence times from DNA sequence data. Genetics. 1997;145(2):505–518. - PMC - PubMed
    1. Weiss G, von Haeseler A. Inference of population history using a likelihood approach. Genetics. 1998;149:1539–1546. - PMC - PubMed
    1. Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol. 1999;16(12):1791–1798. - PubMed
    1. Beaumont MA, Zhang W, Balding DJ. Approximate bayesian computation in population genetics. Genetics. 2002;162(4):2025–2035. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources