A fast and calibrated computer model emulator: an empirical Bayes approach (original) (raw)

A Fast, Scalable, and Calibrated Computer Model Emulator: An Empirical Bayes Approach

arXiv: Computation, 2020

Mathematical models implemented on a computer have become the driving force behind the acceleration of the cycle of scientific processes. This is because computer models are typically much faster and economical to run than physical experiments. In this work, we develop an empirical Bayes approach to predictions of physical quantities using a computer model, where we assume that the computer model under consideration needs to be calibrated and is computationally expensive. We propose a Gaussian process emulator and a Gaussian process model for the systematic discrepancy between the computer model and the underlying physical process. This allows for closed-form and easy-to-compute predictions given by a conditional distribution induced by the Gaussian processes. We provide a rigorous theoretical justification of the proposed approach by establishing posterior consistency of the estimated physical process. The computational efficiency of the methods is demonstrated in an extensive simu...

Development and Implementation of Bayesian Computer Model Emulators

Our interest is the risk assessment of rare natural hazards, such as large volcanic pyroclastic flows. Since catastrophic consequences of volcanic flows are rare events, our analysis benefits from the use of a computer model to provide information about these events under natural conditions that may not have been observed in reality.

Gaussian process emulation of computer models with massive output

2016

Often computer models yield massive output; e.g., a weather model will yield the predicted temperature over a huge grid of points in space and time. Emulation of a computer model is the process of finding an approximation to the computer model that is much faster to run than the computer model itself (which can often take hours or days for a single run). Most successful emulation approaches are statistical in nature, but these have only rarely attempted to deal with massive computer model output; some approaches that have been tried include utilization of multivariate emulators, modeling of the output (e.g., through some basis representation, including PCA), and construction of parallel emulators at each grid point, with the methodology typically based on use of Gaussian processes to construct the approximations. These approaches will be reviewed, with the startling computational simplicity with which the last approach can be implemented being highlighted and its success illustrated...

Bayesian calibration of computer models

Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2001

We consider prediction and uncertainty analysis for systems which are approximated using complex mathematical models. Such models, implemented as computer codes, are often generic in the sense that by suitable choice of some of the model's input parameters the code can be used to predict behaviour of the system in a variety of speci c applications. However, in any speci c application the values of necessary parameters may be unknown. In this case, physical observations of the system in the speci c context are used to learn about the unknown parameters. The process of tting the model to the observed data by adjusting the parameters is known as calibration. Calibration is typically e ected by ad hoc tting, and after calibration the model is used, with the tted input values, to predict future behaviour of the system. We present a Bayesian calibration technique which improves on this traditional approach in two respects. First, the predictions allow for all sources of uncertainty, including the remaining uncertainty over the tted parameters. Second, they attempt to correct for any inadequacy of the model which is revealed by discrepancy between the observed data and the model predictions from even the best-tting parameter values. The method is illustrated using data from a nuclear radiation release at Tomsk, and from a more complex simulated nuclear accident exercise.

Gaussian process emulation for second-order Monte Carlo simulations

Journal of Statistical Planning and Inference, 2011

We consider the use of emulator technology as an alternative method to second-order Monte Carlo (2DMC) in the uncertainty analysis for a percentile from the output of a stochastic model. 2DMC is a technique that uses repeated sampling in order to make inferences on the uncertainty and variability in a model output. The conventional 2DMC approach can often be highly computational, making methods for uncertainty and sensitivity analysis unfeasible. We explore the adequacy and efficiency of the emulation approach, and we find that emulation provides a viable alternative in this situation. We demonstrate these methods using two different examples of different input dimensions, including an application that considers contamination in pre-pasteurised milk.

Π4U: A high performance computing framework for Bayesian uncertainty quantification of complex models

Journal of Computational Physics, 2015

We present 4U, 1 an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.

Variable Selection for Gaussian Process Models in Computer Experiments

Technometrics, 2006

In many situations, simulation of complex phenomena requires a large number of inputs and is computationally expensive. Identifying the inputs which most impact the system so that these factors can be further investigated can be a critical step in the scientific endeavor. In computer experiments, it is common to use a Gaussian spatial process to model the output of the simulator. In this article, we introduce a new, simple method for identifying active factors in computer screening experiments. The approach is Bayesian and only requires the generation of a new inert variable in the analysis; however, in the spirit of frequentist hypothesis testing, the posterior distribution of the inert factor is used as a reference distribution against which the importance of the experimental factors can be assessed. The methodology is demonstrated on an application in material science, a computer experiment from the literature, and simulated examples.

A comparison of emulation methods for Approximate Bayesian Computation

Approximate Bayesian Computation (ABC) is a family of statistical inference techniques, which is increasingly used in biology and other scientific fields. Its main benefit is to be applicable to models for which the computation of the model likelihood is intractable. The basic idea of ABC is to empirically approximate the model likelihood by using intensive realizations of model runs. Due to computing time limitations, ABC has thus been mainly applied to models that are relatively quick to simulate. We here aim at briefly introducing the field of statistical emulation of computer code outputs and to demonstrate its potential for ABC applications. Emulation consists in replacing the costly to simulate model by another (quick to simulate) statistical model called emulator or metamodel. This emulator is fitted to a small number of outputs of the original model, and is subsequently used as a surrogate during the inference procedure. In this contribution, we first detail the principles of model emulation, with a special reference to the ABC context in which the description of the stochasticity of model realizations is as important as the description of the trends linking model parameters and outputs. We then compare several emulation strategies in an ABC context, using as case study a stochastic ecological model of community dynamics. We finally describe a novel emulation-based sequential ABC algorithm which is shown to decrease computing time by a factor of two on the studied example, compared to previous sequential ABC algorithms.