Approximate Bayesian Computation Research Papers (original) (raw)

If archaeology is to take a leading role in the social sciences, new theoretical and methodological advances emerging from the natural sciences cannot be ignored. This requires considerable retooling for archaeology as a discipline at a... more

If archaeology is to take a leading role in the social sciences, new theoretical and methodological advances emerging from the natural sciences cannot be ignored. This requires considerable retooling for archaeology as a discipline at a population scale of analysis. Such an approach is not easy to carry through, especially owing to historically contingent regional traditions; however, the knowledge gained by directly addressing these problems head-on is well worth the effort. This paper shows how population level processes driving cultural evolution can be better understood if mathematical and computational methods, often with a strong element of simulation, are applied to archaeological datasets. We use computational methods to study patterns and process of temporal variation in the frequency of cultural variants. More specifically, we will explore how lineages of lithic technologies are transmitted over time using a well-analysed and chronologically fine-grained assemblage of Central European Neolithic armatures from the French Jura. We look for sharp cultural transitions in the frequency of armature types by trying to detect significant mismatches between predictions dictated by an unbiased transmission model and observed empirical data. A simple armature classification scheme based on morphology is introduced. The results have considerable implications for analysing and understanding cultural transmission pathways not only for Neolithic armatures, but also for the evolution of lithic technology more generally in different spatiotemporal contexts.

Risk aggregation is a popular method used to estimate the sum of a collection of financial assets or events, where each asset or event is modelled as a random variable. Applications, in the financial services industry, include insurance,... more

Risk aggregation is a popular method used to estimate the sum of a collection of financial assets or events, where each asset or event is modelled as a random variable. Applications, in the financial services industry, include insurance, operational risk, stress testing, and sensitivity analysis, but the problem is widely encountered in many other application domains. This thesis has contributed two algorithms to perform Bayesian risk aggregation when model exhibit hybrid dependency and high dimensional inter-dependency. The first algorithm operates on a subset of the general problem, with an emphasis on convolution problems, in the presence of continuous and discrete variables (so called hybrid models) and the second algorithm offer a universal method for general purpose inference over much wider classes of Bayesian Network models.

Characterization of nanoparticle aggregates from observed scattered light leads to a highly complex inverse problem. Even the forward model is so complex that it prohibits the use of classical likelihood-based inference methods. In this... more

Characterization of nanoparticle aggregates from observed scattered light leads to a highly complex inverse problem. Even the forward model is so complex that it prohibits the use of classical likelihood-based inference methods. In this study, we compare four so-called likelihood-free methods based on approximate Bayesian computation (ABC) that requires only numeric simulation of the forward model without the need of evaluating a likelihood. In particular, rejection, Markov chain Monte Carlo, population Monte Carlo, and adaptive population Monte Carlo (APMC) are compared in terms of accuracy. In the current model, we assume that the nano-particle aggregates are mutually well separated and made up of particles of same size. Filippov's particle-cluster algorithm is used to generate aggregates, and discrete dipole approximation is used to estimate scattering behavior. It is found that the APMC algorithm is superior to others in terms of time and acceptance rates, although all algorithms produce similar posterior distributions. Using ABC techniques and utilizing unpolarized light experiments at 266 nm wavelength, characterization of soot aggregates is performed with less than 2 nm deviation in nanoparticle radius and 3–4 deviation in number of nanoparticles forming the monodisperse aggregates. Promising results are also observed for the polydisperse aggregate with log-normal particle size distribution.

Since the predicted creep compliance according to the Eurocode 2 does not take the effect of admixtures into consideration, a study of the influence of admixtures on creep is performed. Based on a large database established from... more

Since the predicted creep compliance according to the Eurocode 2 does not take the effect of admixtures into consideration, a study of the influence of admixtures on creep is performed. Based on a large database established from international laboratories and research centers, a comparison between the experimental creep and the creep predicted by the Eurocode 2 is performed using statistical methods. An in-accurate estimation is detected based on the type of the admixture used. The use of a combination of water reducer and silica fume as admixtures leads to an underestimation of the creep compliance.
In order to overcome this difference, a calibration is required by adding corrective coefficients to the Eurocode 2 equations, taking into consideration the type and percentage of admixtures. The Approximate Bayesian Computation method based on the rejection algorithm is applied in order to calculate the corrective coefficients.
After implementing the corrective coefficients in the Eurocode 2 compliance formula, and in order to evaluate our updated Eurocode 2 creep model, statistical methods are used. An improvement in the results is clearly shown. Using the large experimental database, the present study demonstrates the importance of the Bayesian model assessment for the updating of the Eurocode 2 creep model, taking into account the effect of admixtures. The adoption of such a design approach would improve long-term serviceability of structures subject to creep.

Practitioners and construction management researchers lack believable and practical methods to assess the value proposition of emerging methods such as Virtual Design and Construction (VDC) including understanding how different levels of... more

Practitioners and construction management researchers lack believable and practical methods to assess the value proposition of emerging methods such as Virtual Design and Construction (VDC) including understanding how different levels of implementation affect its benefits. Furthermore, current methods of understanding VDC implementation and benefits cannot be updated easily to incorporate new data. This paper presents a Bayesian framework to predict benefits from application of Virtual Design and Construction (VDC) given data about its implementation. We analyzed data from 40 projects that performed some formal modeling of the project scope and/or the construction process. The analysis suggests that more extensive or higher levels of VDC implementation lead to higher project benefits. We explain the use of a Bayesian framework as an alternative to the application of classical probability theory to construction management research, how we used it to interpret data about VDC practice and outcomes, our finding that benefits have strong positive contingent correlation with the level of VDC implemented on projects, and our suggestion to use the method to update conclusions about benefits given changing data about implementation and outcomes.

In the following article we consider approximate Bayesian parameter inference for observation driven time series models. Such statistical models appear in a wide variety of applications, including econometrics and applied mathematics.... more

In the following article we consider approximate Bayesian parameter inference for observation driven time series models. Such statistical models appear in a wide variety of applications, including econometrics and applied mathematics. This article considers the scenario where the likelihood function cannot be evaluated point-wise; in such cases, one cannot perform exact statistical inference, including parameter estimation, which often requires advanced computational algorithms, such as Markov chain Monte Carlo (MCMC). We introduce a new approximation based upon approximate Bayesian computation (ABC). Under some conditions, we show that as n → ∞, with n the length of the time series, the ABC posterior has, almost surely, a maximum a posteriori (MAP) estimator of the parameters which is different from the true parameter. However, a noisy ABC MAP, which perturbs the original data, asymptotically converges to the true parameter, almost surely. In order to draw statistical inference, for the ABC approximation adopted, standard MCMC algorithms can have acceptance probabilities that fall at an exponential rate in n and slightly more advanced algorithms can mix poorly. We develop a new and improved MCMC kernel, which is based upon an exact approximation of a marginal algorithm , whose cost per-iteration is random but the expected cost, for good performance, is shown to be O(n 2) per-iteration. We implement our new MCMC kernel for parameter inference from models in econometrics.

Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants... more

Few problems in statistics are as perplexing as variable selection in the presence of very many redundant covariates. The variable selection problem is most familiar in parametric environments such as the linear model or additive variants thereof. In this work, we abandon the linear model framework, which can be quite detrimental when the covariates impact the outcome in a non-linear way, and turn to tree-based methods for variable selection. Such variable screening is traditionally done by pruning down large trees or by ranking variables based on some importance measure. Despite heavily used in practice, these ad-hoc selection rules are not yet well understood from a theoretical point of view. In this work, we devise a Bayesian tree-based probabilistic method and show that it is consistent for variable selection when the regression surface is a smooth mix of $ p> n $ covariates. These results are the first model selection consistency results for Bayesian forest priors. Probabilistic assessment of variable importance is made feasible by a spike-and-slab wrapper around sum-of-trees priors. Sampling from posterior distributions over trees is inherently very difficult. As an alternative to MCMC, we propose ABC Bayesian Forests, a new ABC sampling method based on data-splitting that achieves higher ABC acceptance rate while retaining probabilistic coherence. We show that the method is robust and successful at finding variables with high marginal inclusion probabilities. Our ABC algorithm provides a new avenue towards approximating the median probability model in non-parametric setups where the marginal likelihood is intractable.

The stratified series of Iron Age radiocarbon dates from Tel Reh ov, based on short-lived samples, measured in Groningen, is the most detailed and dense chronometric record currently available for the Levant in this period. The more... more

The stratified series of Iron Age radiocarbon dates from Tel Reh ov, based on short-lived samples, measured in Groningen, is the most detailed and dense chronometric record currently available for the Levant in this period. The more detailed IntCal98 calibration curve was used, though some comparisons were made with the smoothed IntCal04 curve. The current Bayesian stratigraphic model for Tel Reh ov gave a number of significant results. The data strongly favour an early Iron Age IB–IIA transition, as the statistically sampled boundary in the 1T range is 992–961 BCE (68.2%). Considering the 2T range, the older time option, 998–957 BCE, further increases in probability to 75.2%, but a second option also appears, 953–921 BCE, albeit with a significantly lower relative probability of 20.2%. Our Bayesian model was also tested with the IntCal04 calibration curve, which gave similar but slightly older results: the 1T range is 993–961 BCE (68.2%) and the 2T range is 1001–927 BCE (95.4%). Th...

What was driving the migrations of the first farmers across Europe? How were demography, society, and environment interconnected to give rise to the macroregional expansion pattern that archaeology is revealing? We simulate the demography... more

What was driving the migrations of the first farmers across Europe? How were demography, society, and environment interconnected to give rise to the macroregional expansion pattern that archaeology is revealing? We simulate the demography and spatial behavior of the first farming communities in the Central Balkans in order to infer the parameters and mechanisms of the Neolithic expansion in this part of Europe. We compare the simulation output to the empirical record of radiocarbon dates in order to systematically evaluate which expansion scenarios were the most probable. Our results suggest that if the expansion of the Neolithic unfolded in accord with the specific wave of advance model that we presented in this paper, the expansion was driven by very high fertility and community fission to avoid social tensions. The simulation suggests that the number of children born by an average Neolithic woman who lived through her entire fertile period was around 8 children or more, which is on the high end of the ethnographically recorded human total fertility rate spectrum. The most plausible simulated fission threshold values are between 50 and 100 people, which is usually smaller than the estimated environmental carrying capacity. This would suggest that the primary reason for the community fission and for seeking out new land was social rather than ecological.

The deformation due to creep has an important effect on the behavior of concrete structures especially for their long term integrity. Undesirable consequences may appear in the structures due to incorrect or inaccurate prediction of creep... more

The deformation due to creep has an important effect on the behavior of concrete structures especially for their long term integrity. Undesirable consequences may appear in the structures due to incorrect or inaccurate prediction of creep deformation. A large database coming from international laboratories and research centers is used in order to compare the experimental results with the Eurocode 2 creep prediction. This study shows that the Eurocode 2 underestimates the important creep compliance and overestimates the small creep compliance. In order to overcome this inaccuracy, new correction coefficients are introduced to the formulas of the Eurocode 2 using an Approximate Bayesian Computation method based on the rejection algorithm.

We consider a Bayesian analysis method of paired survival data using a bivariate exponential model proposed by Moran (1967, Biometrika 54:385–394). Important features of Moran’s model include that the marginal distributions are... more

We consider a Bayesian analysis method of paired survival data using a bivariate exponential model proposed by Moran (1967, Biometrika 54:385–394). Important features of Moran’s model include that the marginal distributions are exponential and the range of the correlation coefficient is between 0 and 1. These contrast with the popular exponential model with gamma frailty. Despite these nice properties, statistical analysis with Moran’s model has been hampered by lack of a closed form likelihood function. In this paper, we introduce a latent variable to circumvent the difficulty in the Bayesian computation. We also consider a model checking procedure using the predictive Bayesian P-value.

Secondary contact between long isolated populations has several possible outcomes. These include the strengthening of preexisting reproductive isolating mechanisms via reinforcement, the emergence of a hybrid lineage that is distinct from... more

Secondary contact between long isolated populations has several possible outcomes. These include the strengthening of preexisting reproductive isolating mechanisms via reinforcement, the emergence of a hybrid lineage that is distinct from its extant parental lineages and which occupies a spatially restricted zone between them, or complete merging of two populations such that parental lineages are no longer extant ("lineage fusion" herein). The latter scenario has rarely been explicitly considered in single-species and comparative phylogeographic studies, yet it has the potential to impact inferences about population history and levels of congruence. In this paper, we explore the idea that insights into past lineage fusion may now be possible, owing to the advent of next-generation sequencing. Using simulated DNA sequence haplotype datasets (i.e., loci with alleles comprised of a set of linked nucleotide polymorphisms), we examined basic requirements (number of loci and individuals sampled) for identifying cases when a present-day panmictic population is the product of lineage fusion, using an exemplar statistical framework-approximate Bayesian computation. We found that with approximately 100 phased haplotype loci (each 400 bp long) and modest sample sizes of individuals (10 per population), lineage fusion can be detected under rather challenging scenarios. This included some scenarios where reticulation was fully contained within a Last Glacial Maximum timeframe, provided that mixing was symmetrical, ancestral gene pools were moderately to deeply diverged, and the lag time between the fusion event and gene pool sampling was relatively short. However, the more realistic case of asymmetrical mixing is not prohibitive if additional genetic data (e.g., 400 loci) are available. Notwithstanding some simplifying assumptions of our simulations and the knowledge gaps that remain about the circumstances under which lineage fusion is potentially detectable, we suggest that the recent release from data limitation allows phylogeographers to expand the scope of inferences about long-term population history.

The logic of uncertainty is not the logic of experience and as well as it is not the logic of chance. It is the logic of experience and chance. Experience and chance are two inseparable poles. These are two dual reections of one essence,... more

The logic of uncertainty is not the logic of experience and as well as it is not the logic of chance. It is the logic of experience and chance. Experience and chance are two inseparable poles. These are two dual reections of one essence, which is called co∼event. The theory of experience and chance is the theory of co∼events. To study the co∼events, it is not enough to study the experience and to study the chance. For this, it is necessary to study the experience and chance as a single entire, a co∼event. In other words, it is necessary to study their interaction within a co∼event. The new co∼event axiomatics and the theory of co∼events following from it were created precisely for these purposes. In this work, I am going to demonstrate the effectiveness of the new theory of co∼events in a studying the logic of uncertainty. I will do this by the example of a co∼event splitting of the logic of the Bayesian scheme, which has a long history of erce debates between Bayesionists and frequentists. I hope the logic of the theory of experience and chance will make its modest contribution to the application of these old dual debaters., theory of experience and chance, co∼event dualism, co∼event axiomatics, logic of uncertainty, logic of experience and chance, logic of cause and consequence, logic of the past and the future, Bayesian scheme.

Understanding the evolutionary history of contemporary animal groups is essential for conservation and management of endangered species like caribou (Rangifer tarandus). In central Canada, the ranges of two caribou subspecies... more

Understanding the evolutionary history of contemporary animal groups is essential for conservation and management of endangered species like caribou (Rangifer tarandus). In central Canada, the ranges of two caribou subspecies (barren-ground/woodland caribou) and two woodland caribou ecotypes (boreal/eastern migratory) overlap. Our objectives were to reconstruct the evolutionary history of the eastern migratory ecotype and to assess the potential role of introgression in ecotype evolution. STRUCTURE analyses identified five higher order groups (i.e. three boreal caribou populations, eastern migratory ecotype and barren-ground). The evolutionary history of the eastern migratory ecotype was best explained by an early genetic introgression from barren-ground into a woodland caribou lineage during the Late Pleistocene and subsequent divergence of the eastern migratory ecotype during the Holocene. These results are consistent with the retreat of the Laurentide ice sheet and the colonization of the Hudson Bay coastal areas subsequent to the establishment of forest tundra vegetation approximately 7000 years ago. This historical reconstruction of the eastern migratory ecotype further supports its current classification as a conservation unit, specifically a Designatable Unit, under Canada’s Species at Risk Act. These findings have implications for other sub-specific contact zones for caribou and other North American species in conservation unit delineation.

We propose a noniterative sampling approach by combining the inverse Bayes formulae (IBF), sampling/importance resampling and posterior mode esti- mates from the Expectation/Maximization (EM) algorithm to obtain an i.i.d. sam- ple... more

We propose a noniterative sampling approach by combining the inverse Bayes formulae (IBF), sampling/importance resampling and posterior mode esti- mates from the Expectation/Maximization (EM) algorithm to obtain an i.i.d. sam- ple approximately from the posterior distribution for problems where the EM-type algorithms apply. The IBF shows that the posterior is proportional to the ratio of two conditional distributions and its numerator provides a natural class of built-in importance sampling functions (ISFs) directly from the model specification. Given that the posterior mode by an EM-type algorithm is relatively easy to obtain, a best ISF can be identified by using that posterior mode, which results in a large overlap area under the target density and the ISF. We show why this procedure works the- oretically. Therefore, the proposed method provides a novel alternative to perfect sampling and eliminates the convergence problems of Markov chain Monte Carlo methods. We first illustrate...

A wide range of theories and methods inspired from evolutionary biology have recently been used to investigate temporal changes in the frequency of archaeological material. Here we follow this research agenda and present a novel approach... more

A wide range of theories and methods inspired from evolutionary biology have recently been used to investigate temporal changes in the frequency of archaeological material. Here we follow this research agenda and present a novel approach based on Approximate Bayesian Computation (ABC), which enables the evaluation of multiple competing evolutionary models formulated as computer simulations. This approach offers the opportunity to: 1) flexibly integrate archaeological biases derived from sampling and time averaging; 2) estimate model parameters in a probabilistic fashion, taking into account both prior knowledge and empirical data; and 3) shift from an hypothesis-testing to a model selection approach. We applied ABC to a chronologically fine-grained Western European Neolithic armature assemblage, comparing three possible candidate models of evolutionary change: 1) unbiased transmission; 2) conformist bias; and 3) anti-conformist bias. Results showed that unbiased and anti-conformist transmission models provide equally good explanatory models for the observed data, suggesting high levels of equifinality. We also examined whether the appearance of the Bell Beaker culture was correlated with marked changes in the frequency of different armature types. Comparisons between the empirical data and expectations generated from the simulation model did not show any evidence in support of this hypothesis and instead indicated lower than expected dissimilarity between assemblages dated before and after the emergence of the Bell Beaker culture.