Model Misspecification in Approximate Bayesian Computation: Consequences and Diagnostics (original) (raw)

Model Misspecification in ABC: Consequences and Diagnostics

arXiv (Cornell University), 2017

We analyze the behavior of approximate Bayesian computation (ABC) when the model generating the simulated data differs from the actual data generating process; i.e., when the data simulator in ABC is misspecified. We demonstrate both theoretically and in simple, but practically relevant, examples that when the model is misspecified different versions of ABC can yield substantially different results. Our theoretical results demonstrate that even though the model is misspecified, under regularity conditions, the accept/reject ABC approach concentrates posterior mass on an appropriately defined pseudo-true parameter value. However, under model misspecification the ABC posterior does not yield credible sets with valid frequentist coverage and has non-standard asymptotic behavior. In addition, we examine the theoretical behavior of the popular local regression adjustment to ABC under model misspecification and demonstrate that this approach concentrates posterior mass on a completely different pseudo-true value than accept/reject ABC. Using our theoretical results, we suggest two approaches to diagnose model misspecification in ABC. All theoretical results and diagnostics are illustrated in a simple running example.

A new approach to choose acceptance cutoff for approximate Bayesian computation

Journal of Applied Statistics, 2013

Approximate Bayesian computation (ABC) is an approach for sampling from an approximate posterior distribution in the presence of a computationally intractable likelihood function. A common implementation is based on simulating model, parameter and dataset triples, (m, θ, y), from the prior, and then accepting as samples from the approximate posterior, those pairs (m, θ) for which y, or a summary of y, is "close" to the observed data. Closeness is typically determined though a distance measure and a kernel scale parameter, . Appropriate choice of is important to producing a good quality approximation. This paper proposes diagnostic tools for the choice of based on assessing the coverage property, which asserts that credible intervals have the correct coverage levels. We provide theoretical results on coverage for both model and parameter inference, and adapt these into diagnostics for the ABC context. We re-analyse a study on human demographic history to determine whether the adopted posterior approximation was appropriate. R code implementing the proposed methodology is freely available in the package abc.

Approximating the likelihood in approximate Bayesian computation

arXiv (Cornell University), 2018

The conceptual and methodological framework that underpins approximate Bayesian computation (ABC) is targetted primarily towards problems in which the likelihood is either challenging or missing. ABC uses a simulation-based non-parametric estimate of the likelihood of a summary statistic and assumes that the generation of data from the model is computationally cheap. This chapter reviews two alternative approaches for estimating the intractable likelihood, with the goal of reducing the necessary model simulations to produce an approximate posterior. The first of these is a Bayesian version of the synthetic likelihood (SL), initially developed by Wood (2010), which uses a multivariate normal approximation to the summary statistic likelihood. Using the parametric approximation as opposed to the nonparametric approximation of ABC, it is possible to reduce the number of model simulations required. The second likelihood approximation method we consider in this chapter is based on the empirical likelihood (EL), which is a non-parametric technique and involves maximising a likelihood constructed empirically under a set of moment constraints. Mengersen et al. (2013) adapt the EL framework so that it can be used to form an approximate posterior for problems where ABC can be applied, that is, for models with intractable likelihoods. However, unlike ABC and the Bayesian SL (BSL), the Bayesian EL (BC el) approach can be used to completely avoid model simulations in some cases. The BSL and BCel methods are illustrated on models of varying complexity.

Robust Approximate Bayesian Computation: An Adjustment Approach

arXiv: Methodology, 2020

We propose a novel approach to approximate Bayesian computation (ABC) that seeks to cater for possible misspecification of the assumed model. This new approach can be equally applied to rejection-based ABC and to popular regression adjustment ABC. We demonstrate that this new approach mitigates the poor performance of regression adjusted ABC that can eventuate when the model is misspecified. In addition, this new adjustment approach allows us to detect which features of the observed data can not be reliably reproduced by the assumed model. A series of simulated and empirical examples illustrate this new approach.

Synthetic Likelihood in Misspecified Models: Consequences and Corrections

2021

We analyse the behaviour of the synthetic likelihood (SL) method when the model generating the simulated data differs from the actual data generating process. One of the most common methods to obtain SL-based inferences is via the Bayesian posterior distribution, with this method often referred to as Bayesian synthetic likelihood (BSL). We demonstrate that when the model is misspecified, the BSL posterior can be poorly behaved, placing significant posterior mass on values of the model parameters that do not represent the true features observed in the data. Theoretical results demonstrate that in misspecified models the BSL posterior can display a wide range of behaviours depending on the level of model misspecification, including being asymptotically non-Gaussian. Our results suggest that a recently proposed robust BSL approach can ameliorate this behavior and deliver reliable posterior inference under model misspecification. We document all theoretical results using a simple runnin...

Correcting Approximate Bayesian Computation

Trends in Ecology & Evolution, 2010

In their review of approximate Bayesian computation (ABC), Csilléry et al. [pg. 411, 1] stated that my [2] "main" objections to ABC are that inference is limited to a finite set of models, and that these models are generally complex, although they failed to state the reasons for my objections. Csilléry et al. further state that my criticisms were "general criticisms of model-based approaches and are not specific to ABC" [pg. 411, 1]. However my main objection to ABC was that it can produce posterior "probabilities" that are not true probabilities. The source for Csilléry et al. 's claim that my objections were not specific to ABC is Beaumont et al. [3], who did acknowledge my main objection but without addressing the underlying reasons. Instead, Beaumont et al. [pg. 438, 3] state "… Templeton is in effect claiming that standard Bayesian inferences are invalid, and that Bayesian posterior probabilities are mathematically incapable of being probabilities." The words "in effect" indicate that I actually never made this statement, and indeed I do not believe it. Contrary to the statement in Csilléry et al. [1], my objections were very specific to ABC. By misrepresenting my views, both Csilléry et al. [1] and Beaumont et al. [3] avoid addressing my specific criticisms and instead mount a general, but irrelevant, defense of Bayesian statistics and model-based inference. Rather than reiterate these published objections [2 , 4], I will give a specific numerical example to illustrate my main objection.

Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It

We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic, and observe that the posterior puts its mass on ever more high-dimensional models as the sample size increases. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the Safe Bayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates as soon the standard posterior is not 'cumulatively concentrated', and its results on our data are quite encouraging. * Also affiliated with Leiden University.

Excursions in the Bayesian treatment of model error

PLOS ONE, 2023

Advances in observational and computational assets have led to revolutions in the range and quality of results in many science and engineering settings. However, those advances have led to needs for new research in treating model errors and assessing their impacts. We consider two settings. The first involves physically-based statistical models that are sufficiently manageable to allow incorporation of a stochastic "model error process". In the second case we consider large-scale models in which incorporation of a model error process and updating its distribution is impractical. Our suggestion is to treat dimension-reduced model output as if it is observational data, with a data model that incorporates a bias component to represent the impacts of model error. We believe that our suggestions are valuable quantitative, yet relatively simple, ways to extract useful information from models while including adjustment for model error. These ideas are illustrated and assessed using an application inspired by a classical oceanographic problem.

A comparison of emulation methods for Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) is a family of statistical inference techniques, which is increasingly used in biology and other scientific fields. Its main benefit is to be applicable to models for which the computation of the model likelihood is intractable. The basic idea of ABC is to empirically approximate the model likelihood by using intensive realizations of model runs. Due to computing time limitations, ABC has thus been mainly applied to models that are relatively quick to simulate. We here aim at briefly introducing the field of statistical emulation of computer code outputs and to demonstrate its potential for ABC applications. Emulation consists in replacing the costly to simulate model by another (quick to simulate) statistical model called emulator or metamodel. This emulator is fitted to a small number of outputs of the original model, and is subsequently used as a surrogate during the inference procedure. In this contribution, we first detail the principles of model emulation, with a special reference to the ABC context in which the description of the stochasticity of model realizations is as important as the description of the trends linking model parameters and outputs. We then compare several emulation strategies in an ABC context, using as case study a stochastic ecological model of community dynamics. We finally describe a novel emulation-based sequential ABC algorithm which is shown to decrease computing time by a factor of two on the studied example, compared to previous sequential ABC algorithms.

Notes to Robert et al.: Model criticism informs model choice and model comparison

2009

In their letter to PNAS and a comprehensive set of notes on arXiv , Christian Robert, Kerrie Mengersen and Carla Chen (RMC) represent our approach to model criticism in situations when the likelihood cannot be computed as a way to "contrast several models with each other". In addition, guided by an analysis of scalar error terms on simple examples, RMC argue that model assessment with Approximate Bayesian Computation under model uncertainty (ABCµ) is unduly challenging and question its Bayesian foundations. We thank RMC for their interest and their detailed comments on our work, which give us an opportunity to clarify the construction of ABCµ and to explain further the utility of ABCµ for the purpose of model criticism. Here, we provide a comprehensive set of answers to RMC's comments, which go beyond our short response . For sake of clarity, we re-state RMC's main points in italic before we answer each of them in turn.