Approximate bayesian computation without summary statistics: the case of admixture - PubMed (original) (raw)

Approximate bayesian computation without summary statistics: the case of admixture

Vitor C Sousa et al. Genetics. 2009 Apr.

Abstract

In recent years approximate Bayesian computation (ABC) methods have become popular in population genetics as an alternative to full-likelihood methods to make inferences under complex demographic models. Most ABC methods rely on the choice of a set of summary statistics to extract information from the data. In this article we tested the use of the full allelic distribution directly in an ABC framework. Although the ABC techniques are becoming more widely used, there is still uncertainty over how they perform in comparison with full-likelihood methods. We thus conducted a simulation study and provide a detailed examination of ABC in comparison with full likelihood in the case of a model of admixture. This model assumes that two parental populations mixed at a certain time in the past, creating a hybrid population, and that the three populations then evolve under pure drift. Several aspects of ABC methodology were investigated, such as the effect of the distance metric chosen to measure the similarity between simulated and observed data sets. Results show that in general ABC provides good approximations to the posterior distributions obtained with the full-likelihood method. This suggests that it is possible to apply ABC using allele frequencies to make inferences in cases where it is difficult to select a set of suitable summary statistics and when the complexity of the model or the size of the data set makes it computationally prohibitive to use full-likelihood methods.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—

Figure 1.—

The admixture model described in the text. We assume a single admixture event, T generations ago. The three populations are allowed to have different sizes _N_1, _N_2, and _N_h.

F<sc>igure</sc> 2.—

Figure 2.—

Example of posterior distributions of three runs. Results obtained for _t_h and _p_1 in three single-locus analyses, varying the number of alleles, are shown. The different lines correspond to the posteriors obtained with the different methods compared (key is shown in the top left plot). For the ABC methods the densities were obtained with the regression step. The prior distributions are shown as horizontal dotted lines and the true parameter value as dotted vertical lines.

F<sc>igure</sc> 3.—

Figure 3.—

Comparison of the posterior distributions obtained for _p_1 with the different methods for the multiple biallelic loci case, with drift ti = 0.01. The results obtained with 5 and 10 loci are shown in the top and bottom rows, respectively. Each solid line corresponds to the posterior obtained for 1 of the 50 repetitions. For the ABC methods, the densities were obtained with the regression step. The prior distributions are shown as dotted horizontal lines and the true parameter values as dashed vertical lines. ABC results obtained with 108 simulations and _P_δ = 10−5.

F<sc>igure</sc> 4.—

Figure 4.—

Effect of tolerance level _P_δ and regression step in the RRMSE and MISE of _p_1 and _t_h. Error values were estimated using 10 biallelic loci, with drift ti = 0.001 and _p_1 = 0.7. For the ABC methods 108 simulations were performed. Solid lines correspond to the error of the regression step and dashed lines to the error of the rejection step. LEA results are shown as a solid diamond at _P_δ = 0. Error bars of MISE correspond to the standard deviation across repetitions, and error bars for the relative RRMSE correspond to the 95%C.I., obtained with 1000 nonparametric bootstrap iterations.

F<sc>igure</sc> 5.—

Figure 5.—

Posterior distributions obtained with the different methods for the analysis of the human data set to estimate admixture in Jamaica. European and African samples were assumed to come from the parental populations _P_1 and _P_2, respectively. The ABC posteriors were based on the closest 1000 points from 10 million simulations (_P_δ = 10−4). The corresponding tolerance distances were 1.73, 1.05, and 75.00 for ABC_SUMSTAT, ABC_ALL_FREQ with _G_ST, and Euclidean, respectively. The upper limit for the drift priors was equal to one (upper limit ti = 1.0).

F<sc>igure</sc> 6.—

Figure 6.—

Effect of drift prior in human data set results. Posterior distributions obtained for _p_1 with the different ABC methods and LEA, varying the upper limit for ti, are shown. The ABC posteriors were based on the closest 1000 points from 10 million simulations (_P_δ = 10−4).

Similar articles

Cited by

References

    1. Beaumont, M., J. Cornuet, J. Marin and C. Robert, 2009. Adaptivity for approximate Bayesian computation algorithms: a population Monte Carlo approach. Biometrika (in press).
    1. Beaumont, M. A., 1999. Detecting population expansion and decline using microsatellites. Genetics 153 2013–2029. - PMC - PubMed
    1. Beaumont, M. A., 2003. Estimation of population growth or decline in genetically monitored populations. Genetics 164 1139–1160. - PMC - PubMed
    1. Beaumont, M. A., and B. Rannala, 2004. The Bayesian revolution in genetics. Nat. Rev. Genet. 5 251–261. - PubMed
    1. Beaumont, M. A., W. Zhang and D. J. Balding, 2002. Approximate Bayesian computation in population genetics. Genetics 162 2025–2035. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources