Approximate Bayesian Computation Research Papers (original) (raw)

2025

Let (Xt , Yt )t≥1 be a homogeneous discrete-time bivariate stochastic process where (Xt )t≥1 is a Markov chain and (Yt |Xt )t≥1 is a conditionally independent sequence such that each Yt is determined almost surely by Xt . If we also assume that only (Yt )t≥1 is available for inference, i.e. that the Markov chain (Xt )t≥1 is unobservable, then (Xt , Yt )t≥1 is usually called a state space or hidden Markov model (HMM). Usually, the law of (Xt , Yt )t≥1 is also taken to be indexed by a d-dimensional parameter θ taking values in Θ.

2025, Molecular Ecology

Drosophila subobscura is a Palearctic species that was first observed in South and North America in the early 1980s, and that rapidly invaded broad latitudinal ranges on both continents. To trace the source and history of this invasion, we obtained genotypic data on nine microsatellite loci from two South American, two North American and five European populations of D. subobscura. We analysed these data with traditional statistics as well as with an approximate Bayesian computation (ABC) framework. ABC methods yielded the strongest support for the scenario involving a serial introduction with founder events from Europe into South America, and then from South America into North America. Stable effective population size of the source population was very large (around one million individuals), and the propagule size was notably smaller for the introduction into South America (i.e. high bottleneck severity index with only a few effective founders) but considerably larger for the subsequent introduction into North America (i.e. low bottleneck severity index with around 100 -150 effective founders). Finally, the Mediterranean region of Europe (and most likely Barcelona from the localities so far analysed) is proposed as the source of the New World flies, based on mean individual assignment statistics.

2025, Journal of Biogeography

AimWe investigated the phylogeographical history of a clonal‐sexual orchid, to test the hypothesis that current patterns of genetic diversity and differentiation retain the traces of climatic fluctuations and of the species reproductive system.LocationEurope, Siberia and Russian Far East.TaxonCypripedium calceolus L. (Orchidaceae).MethodsSamples (>900, from 56 locations) were genotyped at 11 nuclear microsatellite loci and plastid sequences were obtained for a subset of them. Analysis of genetic structure and approximate Bayesian computations were performed. Species distribution modelling was used to explore the effects of past climatic fluctuations on the species range.ResultsAnalysis of genetic diversity reveals high heterozygosity and allele diversity, with no geographical trend. Three genetic clusters are identified with extant gene pools derived from ancestral demes in glacial refugia. Siberian populations exhibit different plastid haplotypes, supporting an early divergence ...

2025, Biometrics

Infectious diseases that can be spread directly or indirectly from one person to another are caused by pathogenic microorganisms such as bacteria, viruses, parasites, or fungi. Infectious diseases remain one of the greatest threats to human health and the analysis of infectious disease data is among the most important application of statistics. In this article, we develop Bayesian methodology using parametric bivariate accelerated lifetime model to study dependency between the colonization and infection times for Acinetobacter baumannii bacteria which is leading cause of infection among the hospital infection agents. We also study their associations with covariates such as age, gender, apache score, antibiotics use 3 months before admission and invasive mechanical ventilation use. To account for singularity, we use Singular Bivariate Extreme Value distribution to model residuals in Bivariate Accelerated lifetime model under the fully Bayesian framework. We analyze a censored data related to the colonization and infection collected in five major hospitals in Turkey using our methodology. The data analysis done in this article is for illustration of our proposed method and can be applied to any situation that our model can be used.

2025, Research Square (Research Square)

Automatic differentiation (AD) is a general method of computing exact derivatives in complex sensitivity analyses and optimisation routines in settings that lack closedform solutions, thus posing challenges for analytical and numerical alternatives. This paper introduces a vectorised version of AD that builds on matrix calculus. This more transparent and efficient version of AD promotes its use in a wider range of statistical and econometric applications that require accurate and fast algorithms for the computation of derivatives when performing frequentist and Bayesian inferences. Numerical studies are presented to demonstrate the efficacy and speed of the proposed AD method compared with the numerical derivative scheme by exploiting, for example, sparse matrix representations and high-level optimisation techniques.

2025

Automatic differentiation (AD) is a general method of computing exact derivatives in complex sensitivity analyses and optimisation routines in settings that lack closed-form solutions, thus posing challenges for analytical and numerical alternatives. This paper introduces a vectorised version of AD that builds on matrix calculus. This more transparent and efficient version of AD promotes its use in a wider range of statistical and econometric applications that require accurate and fast algorithms for the computation of derivatives when performing frequentist and Bayesian inferences. Numerical studies are presented to demonstrate the efficacy and speed of the proposed AD method compared with the numerical derivative scheme by exploiting, for example, sparse matrix representations and high-level optimisation techniques. JEL classifications: C11, C53, E37

2025, Diversity and Distributions

Aim: Cold-adapted biotas from mid-latitudes often show small population sizes, harbour low levels of local genetic diversity and are highly vulnerable to extinction due to ongoing climate warming and the progressive shrinking of montane and alpine ecosystems. In this study, we use a suite of analytical approaches to infer the demographic processes that have shaped contemporary patterns of genomic variation in Omocestus bolivari and Omocestus femoralis, two narrow-endemic and red-listed Iberian grasshoppers forming highly fragmented populations in the sky island archipelago of the Baetic System. Location: South-eastern Iberia. We quantified genomic variation in the two focal taxa and coupled ecological niche models and a spatiotemporally explicit simulation approach based on coalescent theory to determine the relative statistical support of a suite of competing demographic scenarios representing contemporary population isolation (i.e. a predominant role of genetic drift) versus historical connectivity and post-glacial colonization of sky islands (i.e. pulses of gene flow and genetic drift linked to Pleistocene glacial cycles). Results: Inference of spatial patterns of genetic structure, environmental niche modelling and statistical evaluation of alternative species-specific demographic models within an approximate Bayesian computation framework collectively supported genetic admixture during glacial periods and post-glacial colonization of sky islands, rather than long-term population isolation, as the scenario best explaining the current distribution of genomic variation in the two focal taxa. Moreover, our analyses revealed that isolation in sky islands has also led to extraordinary genetic fragmentation and contributed to reduce local levels of genetic diversity. This study exemplifies the potential of integrating genomic and environmental niche modelling data across biological and spatial replicates to determine whether organisms with similar habitat requirements have experienced

2025, Diversity and Distributions

AimCold‐adapted biotas from mid‐latitudes often show small population sizes, harbour low levels of local genetic diversity and are highly vulnerable to extinction due to ongoing climate warming and the progressive shrinking of montane and alpine ecosystems. In this study, we use a suite of analytical approaches to infer the demographic processes that have shaped contemporary patterns of genomic variation in Omocestus bolivari and Omocestus femoralis, two narrow‐endemic and red‐listed Iberian grasshoppers forming highly fragmented populations in the sky island archipelago of the Baetic System.LocationSouth‐eastern Iberia.MethodsWe quantified genomic variation in the two focal taxa and coupled ecological niche models and a spatiotemporally explicit simulation approach based on coalescent theory to determine the relative statistical support of a suite of competing demographic scenarios representing contemporary population isolation (i.e. a predominant role of genetic drift) versus hist...

2025, Engineering with Computers

This research paper presents a comprehensive study on modeling the failure behavior of advanced ceramics by integrating phenomenological and physics-based approaches. The proposed methodology utilizes the bivariate Weibull distribution to capture the complex failure mechanisms in advanced ceramics, considering the impact of Subcritical Crack Growth (SCG). Approximate Bayesian Computation (ABC) is employed for parameter estimation, leveraging Metropolis-Hastings (MH) and Hamiltonian Monte Carlo (HMC) algorithms to enhance computational efficiency. The study validates the proposed models against a physics-based Batdorf theory approach using NASA's CARES/Life. Results demonstrate the robustness of the ABC MH and ABC HMC models, highlighting the capability of statistical approach to predict failure dynamics in advanced ceramics under varying conditions. This research contributes to a deeper understanding of advanced ceramic failure mechanisms, paving the way for further advancements in material science and engineering applications of ceramics.

2025, Journal of Materials Science

Despite the unique advantages of natural fibers as a reinforcement in polymer composites, they have high natural variability in their mechanical properties, resulting in significant uncertainties in the properties of natural fiber composites. This study aims to propose a multilevel framework based on the Approximate Bayesian Computation (ABC) to analyze the uncertainty of fitting the Weibull distribution to the strength data of date palm fibers. Two computationally efficient algorithms of the ABC, namely the Metropolis-Hasting as a family of Markov Chain Monte Carlo and the Sequential Monte Carlo (SMC), are employed for estimating the highest density interval of the fitting parameters of the modified 3-parameter Weibull distribution, and their performances are evaluated. Moreover, appropriate probability distributions that best fit the estimated parameters are determined based on the goodness of fit to describe their characteristics. It is found that the SMC algorithm leads to a higher scatter in the posterior predictive distribution of the fitting parameters. The results suggest that the uncertainty of the fitting parameters should be considered to have a reliable model for the probability of natural fiber failure.

2025, Computational Statistics & Data Analysis

The problem of statistical inference from a Bayesian outlook is studied for the multitype Galton-Watson branching process, considering a non-parametric framework. The only data assumed to be available are each generation's population size vectors. The Gibbs sampler is used in estimating the posterior distributions of the main parameters of the model, and the predictive distributions for as yet unobserved generations. The algorithm provided is independent of whether the process becomes extinct or not. The method is illustrated with simulated examples.

2025, Statistics and Computing

2025, Molecular Ecology

Understanding the evolutionary histories of invasive species is critical to adopt appropriate management strategies, but this process can be exceedingly complex to unravel. As illustrated in this study of the worldwide invasion of the woodwasp Sirex noctilio, population genetic analyses using coalescent‐based scenario testing together with Bayesian clustering and historical records provide opportunities to address this problem. The pest spread from its native Eurasian range to the Southern Hemisphere in the 1900s and recently to Northern America, where it poses economic and potentially ecological threats to planted and native Pinus spp. To investigate the origins and pathways of invasion, samples from five continents were analysed using microsatellite and sequence data. The results of clustering analysis and scenario testing suggest that the invasion history is much more complex than previously believed, with most of the populations being admixtures resulting from independent introd...

2025, Bayesian Analysis

Statistical methods of inference typically require the likelihood function to be computable in a reasonable amount of time. The class of "likelihood-free" methods termed Approximate Bayesian Computation (ABC) is able to eliminate this requirement, replacing the evaluation of the likelihood with simulation from it. Likelihood-free methods have gained in efficiency and popularity in the past few years, following their integration with Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) in order to better explore the parameter space. They have been applied primarily to estimating the parameters of a given model, but can also be used to compare models. Here we present novel likelihood-free approaches to model comparison, based upon the independent estimation of the evidence of each model under study. Key advantages of these approaches over previous techniques are that they allow the exploitation of MCMC or SMC algorithms for exploring the parameter space, and that they do not require a sampler able to mix between models. We validate the proposed methods using a simple exponential family problem before providing a realistic problem from human population genetics: the comparison of different demographic models based upon genetic data from the Y chromosome.

2025, Statistics Group Technical Report

The Auxiliary Particle Filter (APF) introduced by Pitt and Shephard (1999) is a very popular alternative to Sequential Importance Sampling / Resampling (SISR) algorithms to perform inference in state-space models. We propose a novel interpretation of the APF as an SISR algorithm. This interpretation allows us to present simple guidelines to ensure good performance of the APF and the first convergence results for this algorithm. Additionally, we show that, contrary to popular belief, the asymptotic variance of APF-based estimators is not always smaller than those of the corresponding SISR -even in the 'perfect adaptation' scenario. We also explain how similar concepts can be applied to general Sequential Monte Carlo Samplers and provide similar results in this context.

2025, Nature Precedings

Over the past ten years, Approximate Bayesian Computation (ABC) has become hugely popular to estimate the parameters of a model when the likelihood function cannot be computed in a reasonable amount of time. ABC can in principle be used also to perform Bayesian model comparison, but this raises the question of which summary statistic should be used for such applications. Here we present a general method for constructing a summary statistic that is sufficient for the model choice problem. We apply this construction to models from the exponential family. Unfortunately, in more complex models, our construct often results in statistics with too high dimensionality to use in ABC. We therefore discuss the possibility of applying ABC with non-sufficient statistics.

2025, Bayesian Analysis

2025, Statistics and Its Interface

Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines "divide and conquer" ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.

2025, Heredity

In contrast with the classical population genetics theory that models population structure as discrete panmictic units connected by migration, many populations exhibit heterogeneous spatial gradients in population connectivity across semi-continuous habitats. The historical dynamics of such spatially structured populations can be captured by a spatially explicit coalescent model recently proposed by and Barton et al. (2010a, b) and whereby allelic lineages are distributed in a twodimensional spatial continuum and move within this continuum based on extinction and coalescent events. Though theoretically rigorous, this model, which we here refer to as the continuum model, has not yet been implemented for demographic inference. To this end, here we introduce and demonstrate a statistical pipeline that couples the coalescent simulator of Kelleher et al. (2014) that simulates genealogies under the continuum model, with an approximate Bayesian computation (ABC) framework for parameter estimation of neighborhood size (that is, the number of locally breeding individuals) and dispersal ability (that is, the distance an offspring can travel within a generation). Using empirically informed simulations and simulation-based ABC crossvalidation, we first show that neighborhood size can be accurately estimated. We then apply our pipeline to the South African endemic shrub species Berkheya cuneata to use the resulting estimates of dispersal ability and neighborhood size to infer the average population density of the species. More generally, we show that spatially explicit coalescent models can be successfully integrated into model-based demographic inference.

2025, Bayesian Analysis

Determining the marginal likelihood from a simulated posterior distribution is central to Bayesian model selection but is computationally challenging. The often-used harmonic mean approximation (HMA) makes no prior assumptions about the character of the distribution but tends to be inconsistent. The Laplace approximation is stable but makes strong, and often inappropriate, assumptions about the shape of the posterior distribution. Here, I argue that the marginal likelihood can be reliably computed from a posterior sample using Lebesgue integration theory in one of two ways: 1) when the HMA integral exists, compute the measure function numerically and analyze the resulting quadrature to control error; 2) compute the measure function numerically for the marginal likelihood integral itself using a space-partitioning tree, followed by quadrature. The first algorithm automatically eliminates the part of the sample that contributes large truncation error in the HMA. Moreover, it provides a simple graphical test for the existence of the HMA integral. The second algorithm uses the posterior sample to assign probability to a partition of the sample space and performs the marginal likelihood integral directly. It uses the posterior sample to discover and tessellate the subset of the sample space that was explored and uses quantiles to compute a representative field value. When integrating directly, this space may be trimmed to remove regions with low probability density and thereby improve accuracy. This second algorithm is consistent for all proper distributions. Error analysis provides some diagnostics on the numerical condition of the results in both cases.

2025, arXiv (Cornell University)

Weinberg (2012) described a constructive algorithm for computing the marginal likelihood, Z, from a Markov chain simulation of the posterior distribution. Its key point is: the choice of an integration subdomain that eliminates subvolumes with poor sampling owing to low tail-values of posterior probability. Conversely, this same idea may be used to choose the subdomain that optimizes the accuracy of Z. Here, we explore using the simulated distribution to define a small region of high posterior probability, followed by a numerical integration of the sample in the selected region using the volume tessellation algorithm described in Weinberg (2012). Even more promising is the resampling of this small region followed by a naive Monte Carlo integration. The new enhanced algorithm is computationally trivial and leads to a dramatic improvement in accuracy. For example, this application of the new algorithm to a four-component mixture with random locations in 16 dimensions yields accurate evaluation of Z with 5% errors. This enables Bayes-factor model selection for real-world problems that have been infeasible with previous methods.

2025, bioRxiv (Cold Spring Harbor Laboratory)

The process of making inference on networks of spiking neurons is crucial to decipher the underlying mechanisms of neural computation. Mean-field theory simplifies the interactions between neurons to produce macroscopic network behavior, facilitating the study of information processing and computation within the brain. In this study, we perform inference on a mean-field model of spiking neurons to gain insight into likely parameter values, uniqueness and degeneracies, and also to explore how well the statistical relationship between parameters is maintained by traversing across scales. We benchmark against state-of-the-art optimization and Bayesian estimation algorithms to identify their strengths and weaknesses in our analysis. We show that when confronted with dynamical noise or in the case of missing data in the presence of bistability, generating probability distributions using deep neural density estimators outperforms other algorithms, such as adaptive Monte Carlo sampling. However, this class of deep generative models may result in an overestimation of uncertainty and correlation between parameters. Nevertheless, this issue can be improved by incorporating time-delay embedding. Moreover, we show that training deep Neural ODEs on spiking neurons enables the inference of system dynamics from microscopic states. In summary, this work demonstrates the enhanced accuracy and efficiency of inference on networks of spiking neurons when deep learning is harnessed to solve inverse problems in neural computation.

2025, Genetics Selection Evolution

The analysis of nonlinear function-valued characters is very important in genetic studies, especially for growth traits of agricultural and laboratory species. Inference in nonlinear mixed effects models is, however, quite complex and is usually based on likelihood approximations or Bayesian methods. The aim of this paper was to present an efficient stochastic EM procedure, namely the SAEM algorithm, which is much faster to converge than the classical Monte Carlo EM algorithm and Bayesian estimation procedures, does not require specification of prior distributions and is quite robust to the choice of starting values. The key idea is to recycle the simulated values from one iteration to the next in the EM algorithm, which considerably accelerates the convergence. A simulation study is presented which confirms the advantages of this estimation procedure in the case of a genetic analysis. The SAEM algorithm was applied to real data sets on growth measurements in beef cattle and in chickens. The proposed estimation procedure, as the classical Monte Carlo EM algorithm, provides significance tests on the parameters and likelihood based model comparison criteria to compare the nonlinear models with other longitudinal methods. genetic analysis / growth curves / longitudinal data / stochastic approximation EM algorithm

2025, Evolution

Comparative phylogeographic studies often reveal disparate levels of sequence divergence between lineages spanning a common geographic barrier, leading to the conclusion that isolation was nonsynchronous. However, only rarely do researchers account for the expected variance associated with ancestral coalescence and among-taxon variation in demographic history. We introduce a flexible approximate Bayesian computational (ABC) framework that can test for simultaneous divergence (TSD) using a hierarchical model that incorporates idiosyncratic differences in demographic history across taxon pairs. The method is tested across a range of conditions and is shown to be accurate even with single-locus mitochondrial DNA (mtDNA) data. We apply this method to a landmark dataset of putative simultaneous vicariance, eight geminate echinoid taxon pairs thought to have been split by the Isthmus of Panama 3.1 million years ago. The ABC posterior estimates are not consistent with a history of simultaneous vicariance given these data. Subsequent ABC estimates under a constrained model that assumes two divergence times across the eight taxon pairs suggests simultaneous divergence 3.1 million years ago in seven of the taxon pairs and a more recent divergence in the remaining taxon pair. These ABC estimates on the simultaneous divergence of the seven taxon pairs correspond to a DNA substitution rate of approximately 1.59% per lineage per million years at the mtDNA cytochrome oxidase I gene. This ABC framework can easily be modified to analyze single taxon-pair datasets and/or be expanded to include multiple loci, migration, recombination, and other idiosyncratic demographic histories. The flexible aspect of ABC and its built-in evaluation of estimator bias and statistical power has the potential to greatly enhance statistical rigor in phylogeographic studies.

2025, G3 (Bethesda, Md.)

Epigenetics has become one of the major areas of biological research. However, the degree of phenotypic variability that is explained by epigenetic processes still remains unclear. From a quantitative genetics perspective, the estimation of variance components is achieved by means of the information provided by the resemblance between relatives. In a previous study, this resemblance was described as a function of the epigenetic variance component and a reset coefficient that indicates the rate of dissipation of epigenetic marks across generations. Given these assumptions, we propose a Bayesian mixed model methodology that allows the estimation of epigenetic variance from a genealogical and phenotypic database. The methodology is based on the development of a T: matrix of epigenetic relationships that depends on the reset coefficient. In addition, we present a simple procedure for the calculation of the inverse of this matrix ( T-1: ) and a Gibbs sampler algorithm that obtains poster...

2025, Blood

Introduction Multiple Myeloma (MM)is characterizedby heterogeneous clinical outcomes to existing therapies, which reflects the diverse genetic and molecular properties of tumor clones among patients. This intra-clonal heterogeneity may affect distinct molecular pathways within individual patients, contributing to reduced treatment efficacy over time and eventual relapse. In this work we investigate this problem by applying Bayesian network inference to develop high-dimensional network models of MM based on the Interim Analysis 9 (IA9) CoMMpass trial dataset (NCT0145429), an effort by the Multiple Myeloma Research Foundation (MMRF) to collect longitudinal data of newly-diagnosed patients from the United States, Canada and Europe. We demonstrate that our approach finds a number of known drug targets and identifies potentially novel ones. These targets, in our simulations, affect a number of treatment efficacy outcomes. Methods The IA9 dataset encompasses 645 patients with complete cli...

2025, 14th WCCM-ECCOMAS Congress

Approximate Bayesian Computation is used in this work for the selection and calibration of cell proliferation models. Four competing models based on ordinary differential equations are analyzed, by using the measurements of the proliferation of DU-145 prostate cancer viable cells during seven days. The selection criterion of the ABC algorithm is based on the Euclidean distance between the model prediction and the experimental observations. The Richards Model and the Generalized Logistic Model were selected by the ABC algorithm used in this work, providing accurate estimates of the evolution of the number of viable cells. Bayes factor revealed that there was no evidence in favor of any of these two selected models. Cell proliferation is numerically given by the difference between the numbers of newlydivided and dying cells. In order to predict the number of viable cells, several mathematical

2025, International Journal of Thermal Sciences

This paper deals with the solution of an inverse bioheat transfer problem, by using Approximate Bayesian Computation (ABC). A Sequential Monte Carlo (SMC) method is applied for simultaneous model selection and model calibration (estimation of the model parameters) by using synthetic measurements. Two competing models are considered in the analysis of the thermal damage of biological tissues. The results show that the ABC-SMC algorithm provides accurate results for the model selection and estimation of the thermal damage model parameters.

2025, Computational and Applied Mathematics

Cancer is one of the most fatal diseases in the world. Governments and researchers from various areas have continuously concentrated efforts to better understand the disease and propose diagnostic and treatment techniques. The use of mathematical models of tumor growth is of great importance for the development of such techniques. Due to the variety of models nowadays available in the literature, the problems of model selection and parameter estimation come into picture, aiming at suitably predicting the patient's status of the disease. As the available data on dependent variables of existing models might not justify the use of common likelihood functions, approximate Bayesian computation (ABC) becomes a very attractive tool for model selection and model calibration (parameter estimation) in tumor growth models. In the present study, a Monte Carlo approximate Bayesian computation (ABC) algorithm is applied to select among competing models of tumor growth, with and without chemotherapy treatment. Simulated measurements are used in this work. The results obtained show that the algorithm correctly selects the model and estimates the parameters used to generate the simulated measurements. Model selection • Parameter estimation • Approximate Bayesian computation and tumor growth Communicated by Ruben Spies.

2025, 2012 IEEE I2MTC - International Instrumentation and Measurement Technology Conference, Proceedings

The Bayesian inversion of measured data forms an attractive approach to gain statistical knowledge like confidential intervals about the unknown variables given measured data and a model. Out of the class of Markov chain Monte Carlo (MCMC) methods, the Metropolis Hastings (MH) algorithm is a commonly used algorithm to generate samples from the posterior distribution for computational inference. Though easy to implement, the MH algorithm offers drawbacks in terms of computation time and greater modeling costs. In this paper we present an acceleration approach to speed up MCMC with the MH algorithm for an inverse heat transfer problem using two different types of approximations. We will demonstrate the possibility to decrease computation times while maintaining the same estimation accuracy.

2025, Journal of Biogeography

Aim Moose, Alces alces (Linnaeus, 1758), survived the European Pleistocene glaciations in multiple southern refugia, in a northern refugium near the Carpathians and possibly in other locations. During the second millennium ad, moose were nearly extirpated in Europe and only recolonized their current range after World War II. The number and location of refugia during the Pleistocene and recent population lows may have affected the current genetic diversity. We sought to characterize the genetic diversity in European moose in order to determine its genetic structure and the location of genetic hotspots as a way of inferring its population history and the number of Last Glacial Maximum (LGM) refugia. Methods We sequenced 538 nucleotides from the mitochondrial control region of 657 moose from throughout the species' European range. We estimated diversity within and among 16 sampling localities, and used samova to cluster sampling locations into subpopulations. We constructed phylogenetic trees and median-joining networks to examine systematic relationships, and conducted Bayesian analysis of the coalescent and used mismatch distributions and approximate Bayesian computation to infer demographic history. Results Estonia had the highest nucleotide diversity, and western Belarus had the highest haplotype diversity. We observed four regional populations from the samova analysis. We found three haplogroups in European moose, probably representing lineages conserved in different refugia during the Pleistocene. European moose underwent spatial expansion after the LGM, but did not undergo demographic expansion. The effective population size has declined markedly within the last 2000 years. The current levels and distribution of genetic diversity in European moose indicate the effects both of Pleistocene glaciations and of a recent bottleneck, probably associated with anthropogenic influences such as pastoralization and hunting, and a very recent re-expansion. We show that both historical and recent events can influence the diversity and distribution of a large mammal on a large scale.

2025

Computational modeling is a remarkable and common tool to quantitatively describe a biological process. However, most model parameters, such as kinetics parameters, initial conditions and scale factors, are usually unknown because they cannot be directly measured. Therefore, key issues in Systems Biology are model calibration and identifiability analysis, i.e. estimate parameters from experimental data and assess how well those parameters are determined by the dimension and quality of the data. Currently in the Systems Biology and Computational Biology communities, the existing methodologies for parameter estimation are divided in two classes: frequentist methods and Bayesian methods. The first ones are based on the optimization of a cost function while the second ones estimate the posterior distribution of model parameters through different sampling techniques. In this work, we present an innovative Bayesian method, called Conditional Robust Calibration (CRC), for model calibration and identifiability analysis. The algorithm is an iterative procedure based on parameter space sampling and on the definition of multiple objective functions related to each output variables. The method estimates step by step the probability density function (pdf) of parameters conditioned to the experimental measures and it returns as output a subset in the parameter space that best reproduce the dataset. We apply CRC to six Ordinary Differential Equations (ODE) models with different characteristics and complexity to test its performances compared with profile likelihood (PL) and Approximate Bayesian Computation Sequential Montecarlo (ABC-SMC) approaches. The datasets selected for calibration are time course measurements of different nature: noisy or noiseless, real or in silico. Compared with PL, our approach finds a more robust solution because parameter identifiability is inferred by conditional pdfs of estimated parameters. Compared with ABC-SMC, we have found a more precise solution with a reduced computational cost.

2025

Computational modeling is a common tool to quantitatively describe biological processes. However, most model parameters are usually unknown because they cannot be directly measured. Therefore, a key issue in Systems Biology is model calibration, i.e. estimate parameters from experimental data. Existing methodologies for parameter estimation are divided in two classes: frequentist and Bayesian methods. The first ones optimize a cost function while the second ones estimate the parameter posterior distribution through different sampling techniques. Here, we present an innovative Bayesian method, called Conditional Robust Calibration (CRC), for nonlinear model calibration and robustness analysis using omics data. CRC is an iterative algorithm based on the sampling of a proposal distribution and on the definition of multiple objective functions, one for each observable. CRC estimates the probability density function (pdf) of parameters conditioned to the experimental measures and it perf...

2025

2025, IEEE Transactions on Control Systems Technology

In computational mathematical modeling of biological systems, most model parameters, such as initial conditions, kinetics, and scale factors, are usually unknown because they cannot be directly measured. Therefore, key issues in system identification of nonlinear systems are model calibration and identifiability analysis. Currently, existing methodologies for parameter estimation are divided in two classes: frequentist and Bayesian methods. The first optimize a cost function, while the second estimate the posterior distribution of parameters through different sampling techniques. However, when dealing with high-dimensional models, these methodologies suffer from an increasing computational cost due to the important volume of -omic data necessary to carry out reliable and robust solutions. Here, we present an innovative Bayesian method, called conditional robust calibration (CRC), for model calibration and identifiability analysis. The algorithm is an iterative procedure based on a uniform and joint perturbation of the parameter space. At each step the algorithm returns the probability density functions of all parameters that progressively shrink toward specific points in the parameter space. These distributions are estimated on parameter samples that guarantee a certain level of agreement between each observable and the corresponding in silico measure. We apply CRC to a nonlinear high-dimensional ordinary differential equations model representing the signaling pathway of p38MAPK in multiple myeloma. The available data set consists of time courses of proteomic cancerous data. We test CRC performances in comparison with profile-likelihood and approximate Bayesian computation sequential Monte Carlo. We obtain a more precise and robust solution with a reduced computational cost.

2025, International Journal of Thermal Sciences

2025, BMC Public Health

The global impact of COVID-19 and the country-specific responses to the pandemic provide an unparalleled opportunity to learn about different patterns of the outbreak and interventions. We model the global pattern of reported COVID-19 cases during the primary response period, with the aim of learning from the past to prepare for the future. Methods: Using Bayesian methods, we analyse the response to the COVID-19 outbreak for 158 countries for the period 22 January to 9 June 2020. This encompasses the period in which many countries imposed a variety of response measures and initial relaxation strategies. Instead of modelling specific intervention types and timings for each country explicitly, we adopt a stochastic epidemiological model including a feedback mechanism on virus transmission to capture complex nonlinear dynamics arising from continuous changes in community behaviour in response to rising case numbers. We analyse the overall effect of interventions and community responses across diverse regions. This approach mitigates explicit consideration of issues such as period of infectivity and public adherence to government restrictions. Results: Countries with the largest cumulative case tallies are characterised by a delayed response, whereas countries that avoid substantial community transmission during the period of study responded quickly. Countries that recovered rapidly also have a higher case identification rate and small numbers of undocumented community transmission at the early stages of the outbreak. We also demonstrate that uncertainty in numbers of undocumented infections dramatically impacts the risk of multiple waves. Our approach is also effective at pre-empting potential flare-ups. Conclusions: We demonstrate the utility of modelling to interpret community behaviour in the early epidemic stages. Two lessons learnt that are important for the future are: i) countries that imposed strict containment measures early in the epidemic fared better with respect to numbers of reported cases; and ii) broader testing is required early

2025, arXiv (Cornell University)

ABCpy is a highly modular scientific library for approximate Bayesian computation (ABC) written in Python. The main contribution of this paper is to document a software engineering effort that enables domain scientists to easily apply ABC to their research without being ABC experts; using ABCpy they can easily run large parallel simulations without much knowledge about parallelization. Further, ABCpy enables ABC experts to easily develop new inference schemes and evaluate them in a standardized environment and to extend the library with new algorithms. These benefits come mainly from the modularity of ABCpy. We give an overview of the design of ABCpy and provide a performance evaluation concentrating on parallelization. This points us towards the inherent imbalance in some of the ABC algorithms. We develop a dynamic scheduling MPI implementation to mitigate this issue and evaluate the various ABC algorithms according to their adaptability towards high-performance computing.

2025, arXiv (Cornell University)

Dynamic queueing networks (DQN) model queueing systems where demand varies strongly with time, such as airport terminals. With rapidly rising global air passenger traffic placing increasing pressure on airport terminals, efficient allocation of resources is more important than ever. Parameter inference and quantification of uncertainty are key challenges for developing decision support tools. The DQN likelihood function is, in general, intractable and current approaches to simulation make likelihood-free parameter inference methods, such as approximate Bayesian computation (ABC), infeasible since simulating from these models is computationally expensive. By leveraging a recent advance in computationally efficient queueing simulation, we develop the first parameter inference approach for DQNs. We demonstrate our approach with data of passenger flows in a real airport terminal, and we show that our model accurately recreates the behaviour of the system and is useful for decision support. Special care must be taken in developing the distance for ABC since any useful output must vary with time. We use maximum mean discrepancy, a metric on probability measures, as the distance function for ABC. Prediction intervals of performance measures for decision support tools are easily constructed using draws from posterior samples, which we demonstrate with a scenario of a delayed flight.

2025

Structural geomodeling is a key technology for the visualization and quantification of subsurface systems. Given the limited data and the resulting necessity for geological interpretation to construct these geomodels, uncertainty is pervasive and traditionally unquantified. Probabilistic geomodeling allows for the simulation of uncertainties by automatically constructing geomodels from perturbed input data sampled from probability distributions. But random sampling of input parameters can lead to construction of geomodels that are unrealistic, either due to modeling artefacts or by not matching known information about the regional geology of the modeled system. We present here a method to incorporate geological information in the form of geomodel topology into stochastic simulations to constrain resulting probabilistic geomodel ensembles. Simulated geomodel realisations are checked against topology information using a likelihood-free Approximate Bayesian Computation approach. We demonstrate how we can learn our input data parameter (prior) distributions on topology information in two experiments: (1) A synthetic geomodel using a rejection sampling scheme (ABC-REJ) to demonstrate the approach; (2) A geomodel of a subset of the Gullfaks field in the North Sea, comparing both rejection sampling and a Sequential Monte Carlo sampler (ABC-SMC). We also discuss possible speed-ups of using more advanced sampling techniques to avoid simulation of unfeasible geomodels in the first place. Results demonstrate the feasibility to use topology as a summary statistic, to restrict the generation of model ensembles with additional geological information and to obtain improved ensembles of probable geomodels using stochastic simulation methods.

2025, Geoscientific Model Development

Structural geomodeling is a key technology for the visualization and quantification of subsurface systems. Given the limited data and the resulting necessity for geological interpretation to construct these geomodels, uncertainty is pervasive and traditionally unquantified. Probabilistic geomodeling allows for the simulation of uncertainties by automatically constructing geomodel ensembles from perturbed input data sampled from probability distributions. But random sampling of input parameters can lead to construction of geomodels that are unrealistic, either due to modeling artifacts or by not matching known information about the regional geology of the modeled system. We present a method to incorporate geological information in the form of known geomodel topology into stochastic simulations to constrain resulting probabilistic geomodel ensembles using the open-source geomodeling software GemPy. Simulated geomodel realizations are checked against topology information using an approximate Bayesian computation approach to avoid the specification of a likelihood function. We demonstrate how we can infer the posterior distributions of the model parameters using topology information in two experiments: (1) a synthetic geomodel using a rejection sampling scheme (ABC-REJ) to demonstrate the approach and (2) a geomodel of a subset of the Gullfaks field in the North Sea comparing both rejection sampling and a sequential Monte Carlo sampler (ABC-SMC). Possible improvements to processing speed of up to 10.1 times are discussed, focusing on the use of more advanced sampling techniques to avoid the simulation of unfeasible geomodels in the first place. Results demonstrate the feasibility of using topology graphs as a summary statistic to restrict the generation of geomodel ensembles with known geological information and to obtain improved ensembles of probable geomodels which respect the known topology information and exhibit reduced uncertainty using stochastic simulation methods.

2025

Evolutionary computation is a discipline that has been emerging for at least 40 or 50 years. All methods within this discipline are characterized by maintaining a set of possible solutions (individuals) to make them successively evolve to fitter solutions generation after generation. Examples of evolutionary computation paradigms are the broadly known Genetic Algorithms (GAs) and Estimation of Distribution Algorithms (EDAs). This paper contributes to the further development of this discipline by introducing a new evolutionary computation method based on the learning and later simulation of a Bayesian classifier in every generation. In the method we propose, at each iteration the selected group of individuals of the population is divided into different classes depending on their respective fitness value. Afterwards, a Bayesian classifier-either naive Bayes, seminaive Bayes, tree augmented naive Bayes or a similar one-is learned to model the corresponding supervised classification problem. The simulation of the latter Bayesian classifier provides individuals that form the next generation. Experimental results are presented to compare the performance of this new method with different types of EDAs and GAs. The problems chosen for this purpose are combinatorial optimization problems which are commonly used in the literature.

2025, Frontiers in Physics

In this perspective, we examine three key aspects of an end-to-end pipeline for realistic cellular simulations: reconstruction and segmentation of cellular structures; generation of cellular structures; and mesh generation, simulation, and data analysis. We highlight some of the relevant prior work in these distinct but overlapping areas, with a particular emphasis on current use of machine learning technologies, as well as on future opportunities.

2025

This thesis presents the development of a new numerical algorithm for statistical inference problems that require sampling from distributions which are intractable. We propose to develop our sampling algorithm based on a class of Monte Carlo methods, Approximate Bayesian Computation (ABC), which are specifically designed to deal with this type of likelihood-free inference. ABC has become a fundamental tool for the analysis of complex models when the likelihood function is computationally intractable or challenging to mathematically specify. The central theme of our approach is to enhance the current ABC algorithms by exploiting the structure of the mathematical models via derivative information. We introduce Progressive Correction of Gaussian Components (PCGC) as a computationally efficient algorithm for generating proposal distributions in our ABC sampler. We demonstrate on two examples that our new ABC algorithm has an acceptance rate that is one to two orders of magnitude better than the basic ABC rejection sampling.

2025

2025, Computational Statistics & Data Analysis

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

2025, Entropy

Regression analysis using line equations has been broadly applied in studying the evolutionary relationship between the response trait and its covariates. However, the characteristics among closely related species in nature present abundant diversities where the nonlinear relationship between traits have been frequently observed. By treating the evolution of quantitative traits along a phylogenetic tree as a set of continuous stochastic variables, statistical models for describing the dynamics of the optimum of the response trait and its covariates are built herein. Analytical representations for the response trait variables, as well as their optima among a group of related species, are derived. Due to the models' lack of tractable likelihood, a procedure that implements the Approximate Bayesian Computation (ABC) technique is applied for statistical inference. Simulation results show that the new models perform well where the posterior means of the parameters are close to the true parameters. Empirical analysis supports the new models when analyzing the trait relationship among kangaroo species.

2025

In nature, closely related species often exhibit diverse characteristics, challenging simplistic line interpretations of trait evolution. For these species, the evolutionary dynamics of one trait may differ markedly from another, with some traits evolving at a slower pace and others rapidly diversifying. In light of this complexity and concerning the phenomenon of trait relationships that escape line measurement, we introduce a novel general adaptive optimal regression model, grounded on polynomial relationships. This approach seeks to capture intricate patterns in trait evolution by considering them as continuous stochastic variables along a phylogenetic tree. Using polynomial functions, the model offers a holistic and comprehensive description of the traits of the studied species, accounting for both decreasing and increasing trends over evolutionary time. We propose two sets of optimal adaptive evolutionary polynomial regression models of k th order, named the Ornstein-Uhlenbeck Brownian Motion Polynomial (OUBMP k ) model and Ornstein-Uhlenbeck Ornstein-Uhlenbeck Polynomial (OUOUP k ) model, respectively. Assume that the main trait value y t is a random variable of the Ornstein-Uhlenbeck (OU) process and that its optimal adaptive value θ y t has a polynomial relationship with other traits x t for statistical modeling, where x t can be a random variable of Brownian motion (BM) or OU process. As analytical representations for the likelihood of the models are not feasible, we implement an approximate Bayesian computation (ABC) technique to assess the performance through simulation. We also plan to apply models to the empirical study using the two datasets: the longevity vs. fecundity in the Mediterranean nekton group, and the trophic niche breadth vs. body mass in carnivores in a European forest region.

2025, International Journal of Approximate Reasoning

We propose a method for computing the range of the optimal decisions when the utility function runs through a class U. The class U has constraints on the values and the shape of the utility functions. A discretization method enables to easily approximate the optimal decision associated with a particular utility function u 2 U. The range of optimal decisions is computed by a Monte Carlo optimization method. An example is provided with numerical results.

2025, Bioinformatics

Summary: Genetic data obtained on population samples convey information about their evolutionary history. Inference methods can extract part of this information but they require sophisticated statistical techniques that have been made available to the biologist community (through computer programs) only for simple and standard situations typically involving a small number of samples. We propose here a computer program (DIY ABC) for inference based on approximate Bayesian computation (ABC), in which scenarios can be customized by the user to fit many complex situations involving any number of populations and samples. Such scenarios involve any combination of population divergences, admixtures and population size changes. DIY ABC can be used to compare competing scenarios, estimate parameters for one or more scenarios and compute bias and precision measures for a given scenario and known values of parameters (the current version applies to unlinked microsatellite data). This article d...