Variational Bayes Research Papers - Academia.edu (original) (raw)

Mean field variational Bayes (MFVB) is a popular posterior approximation method due to its fast runtime on large-scale data sets. However, it is well known that a major failing of MFVB is that it underestimates the uncertainty of model... more

Mean field variational Bayes (MFVB) is a popular posterior approximation method due to its
fast runtime on large-scale data sets. However, it is well known that a major failing of MFVB
is that it underestimates the uncertainty of model variables (sometimes severely) and provides
no information about model variable covariance. We generalize linear response methods from
statistical physics to deliver accurate uncertainty estimates for model variables—both for indi-
vidual variables and coherently across variables. We call our method linear response variational
Bayes (LRVB). When the MFVB posterior approximation is in the exponential family, LRVB has
a simple, analytic form, even for non-conjugate models. Indeed, we make no assumptions about
the form of the true posterior. We demonstrate the accuracy and scalability of our method on a
range of models for both simulated and real data.

Friston, K., Fortier, M. & Friedman, D. A. (2018). Of woodlice and men: A Bayesian account of cognition, life and consciousness. An interview with Kark Friston. ALIUS Bulletin, 2, 17-43. The entire issue of the ALIUS Bulletin can be... more

Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an... more

Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a mixture Gaussian process conditional heteroscedasticity (MGPCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copulabased approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated MGPCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies.

We tackle the general linear instantaneous model (possibly underdetermined and noisy) where we model the source prior with a Student t distribution. The conjugate-exponential characterisation of the t distribution as an infinite mixture... more

We tackle the general linear instantaneous model (possibly underdetermined and noisy) where we model the source prior with a Student t distribution. The conjugate-exponential characterisation of the t distribution as an infinite mixture of scaled Gaussians enables us to do efficient inference. We study two well-known inference methods, Gibbs sampler and variational Bayes for Bayesian source separation. We derive both techniques as local message passing algorithms to highlight their algorithmic similarities and to contrast their different convergence characteristics and computational requirements. Our simulation results suggest that typical posterior distributions in source separation have multiple local maxima. Therefore we propose a hybrid approach where we explore the state space with a Gibbs sampler and then switch to a deterministic algorithm. This approach seems to be able to combine the speed of the variational approach with the robustness of the Gibbs sampler.

Variational Bayes (VB) has been proposed as a method to facilitate calculations of the posterior distributions for linear models, by providing a fast method for Bayesian inference by estimating the parameters of a factorized approximation... more

Variational Bayes (VB) has been proposed as a method to facilitate calculations of the posterior distributions for linear models, by providing a fast method for Bayesian inference by estimating the parameters of a factorized approximation to the posterior distribution. Here a VB method for nonlinear forward models with Gaussian additive noise is presented. In the case of noninformative priors the parameter estimates obtained from this VB approach are identical to those found via nonlinear least squares. However, the advantage of the VB method lies in its Bayesian formulation, which permits prior information to be included in a hierarchical structure and measures of uncertainty for all parameter estimates to be obtained via the posterior distribution. Unlike other Bayesian methods VB is only approximate in comparison with the sampling method of MCMC. However, the VB method is found to be comparable and the assumptions made about the form of the posterior distribution reasonable. Practically, the VB approach is substantially faster than MCMC as fewer calculations are required. Some of the advantages of the fully Bayesian nature of the method are demonstrated through the extension of the noise model and the inclusion of Automatic Relevance Determination (ARD) within the VB algorithm.

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior and likelihood, since this choice is made by the... more

In Bayesian analysis, the posterior follows from the data and a choice of a prior and a likelihood. One hopes that the posterior is robust to reasonable variation in the choice of prior and likelihood, since this choice is made by the modeler and is necessarily somewhat subjective. Despite the fundamental importance of the problem and a considerable body of literature, the tools of robust Bayes are not commonly used in practice. This is in large part due to the difficulty of calculating robustness measures from MCMC draws. Although methods for computing robustness measures from MCMC draws exist, they lack generality and often require additional coding or computation.
In contrast to MCMC, variational Bayes (VB) techniques are readily amenable to robustness analysis. The derivative of a posterior expectation with respect to a prior or data perturbation is a measure of local robustness to the prior or likelihood. Because VB casts posterior inference as an optimization problem, its methodology is built on the ability to calculate derivatives of posterior quantities with respect to model parameters, even in very complex models. In the present work, we develop local prior robustness measures for mean-field variational Bayes(MFVB), a VB technique which imposes a particular factorization assumption on the variational posterior approximation. We start by outlining existing local prior measures of robustness. Next, we use these results to derive closed-form measures of the sensitivity of mean-field variational posterior approximation to prior specification. We demonstrate our method on a meta-analysis of randomized controlled interventions in access to microcredit in developing countries.

Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an... more

Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a mixture Gaussian process conditional heteroscedasticity (MGPCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copulabased approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated MGPCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies.

He has authored over 50 papers in the most prestigious journals and conferences of the research field in his first seven years as a Researcher. His current research interests include machine learning theory and methodologies, in... more

He has authored over 50 papers in the most prestigious journals and conferences of the research field in his first seven years as a Researcher. His current research interests include machine learning theory and methodologies, in particular, hierarchical Bayesian models, Bayesian nonparametrics, and deep hierarchical feature extractors,

We address the problem of unusual-event detection in a video sequence. Invariant subspace analysis (ISA) is used to extract features from the video, and the time-evolving properties of these features are modeled via an infinite hidden... more

We address the problem of unusual-event detection in a video sequence. Invariant subspace analysis (ISA) is used to extract features from the video, and the time-evolving properties of these features are modeled via an infinite hidden Markov model (iHMM), which is trained using "normal"/"typical" video. The iHMM retains a full posterior density function on all model parameters, including the number of underlying HMM states. Anomalies (unusual events) are detected subsequently if a low likelihood is observed when associated sequential features are submitted to the trained iHMM. A hierarchical Dirichlet process framework is employed in the formulation of the iHMM. The evaluation of posterior distributions for the iHMM is achieved in two ways: via Markov chain Monte Carlo and using a variational Bayes formulation. Comparisons are made to modeling based on conventional maximum-likelihood-based HMMs, as well as to Dirichlet-process-based Gaussian-mixture models. Index Terms-Hidden Markov model (HMM), Dirichlet process, variational Bayes (VB).

We consider the problem of inferring and modeling topics in a sequence of documents with known publication dates. The documents at a given time are each characterized by a topic, and the topics are drawn from a mixture model. The proposed... more

We consider the problem of inferring and modeling topics in a sequence of documents with known publication dates. The documents at a given time are each characterized by a topic, and the topics are drawn from a mixture model. The proposed model infers the change in the topic mixture weights as a function of time. The details of this general framework may take different forms, depending on the specifics of the model. For the examples considered here we examine base measures based on independent multinomial-Dirichlet measures for representation of topic-dependent word counts. The form of the hierarchical model allows efficient variational Bayesian (VB) inference, of interest for large-scale problems. We demonstrate results and make comparisons to the model when the dynamic character is removed, and also compare to latent Dirichlet allocation (LDA) and topics over time (TOT). We consider a database of NIPS papers as well as the United States presidential State of the Union addresses from 1790 to 2008.

This note derives the variational free energy under the Laplace approximation, with a focus on accounting for additional model complexity induced by increasing the number of model parameters. This is relevant when using the free energy as... more

This note derives the variational free energy under the Laplace approximation, with a focus on accounting for additional model complexity induced by increasing the number of model parameters. This is relevant when using the free energy as an approximation to the log-evidence in Bayesian model averaging and selection. By setting restricted maximum likelihood (ReML) in the larger context of variational learning and expectation maximisation (EM), we show how the ReML objective function can be adjusted to provide an approximation to the log-evidence for a particular model. This means ReML can be used for model selection, specifically to select or compare models with different covariance components. This is useful in the context of hierarchical models because it enables a principled selection of priors that, under simple hyperpriors, can be used for automatic model selection and relevance determination (ARD). Deriving the ReML objective function, from basic variational principles, discloses the simple relationships among Variational Bayes, EM and ReML. Furthermore, we show that EM is formally identical to a full variational treatment when the precisions are linear in the hyperparameters. Finally, we also consider, briefly, dynamic models and how these inform the regularisation of free energy ascent schemes, like EM and ReML.

Regression density estimation is the problem of exibly estimating a response distribution as a function of covariates. An important approach to regression density estimation uses mixtures of experts models and our article considers... more

Regression density estimation is the problem of
exibly estimating a response distribution as a function of covariates. An important approach to regression density estimation uses mixtures of experts models and our article considers flexible mixtures of heteroscedastic experts (MHE) regression models where the response distribution
is a normal mixture, with the component means, variances and mixture weights all varying as a function of covariates. Our article develops fast variational approximation methods for inference. Our motivation is that alternative computationally intensive MCMC methods for tting mixture models are difficult to apply when it is desired to fit models repeatedly in exploratory analysis and model choice. Our article makes three
contributions. First, a variational approximation for MHE models is described where the variational lower bound is in closed form. Second, the basic approximation can be improved by using stochastic approximation methods to perturb the initial solution to
attain higher accuracy. Third, the advantages of our approach for model choice and evaluation compared to MCMC based approaches are illustrated. These advantages are particularly compelling for time series data where repeated re tting for one step ahead prediction in model choice and diagnostics and in rolling window computations is very common.

The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source.... more

The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method. Matlab code replicating results reported is available at

In this paper, we present a variational Bayesian (VB) approach to computing the interval estimates for nonhomogeneous Poisson process (NHPP) software reliability models. This approach is an approximate method that can produce analytically... more

In this paper, we present a variational Bayesian (VB) approach to computing the interval estimates for nonhomogeneous Poisson process (NHPP) software reliability models. This approach is an approximate method that can produce analytically tractable posterior distributions. We present simple iterative algorithms to compute the approximate posterior distributions for the parameters of the gamma-type NHPP-based software reliability model using either individual failure time data or grouped data. In numerical examples, the accuracy of this VB approach is compared with the interval estimates based on conventional Bayesian approaches, i.e., Laplace approximation, Markov chain Monte Carlo (MCMC) method, and numerical integration. The proposed VB approach provides almost the same accuracy as MCMC, while its computational burden is much lower.

The Variational Bayes (VB) approach is used as a one-step approximation for Bayesian filtering. It requires the availability of moments of the free-form distributional optimizers. The latter may have intractable functional forms. In this... more

The Variational Bayes (VB) approach is used as a one-step approximation for Bayesian filtering. It requires the availability of moments of the free-form distributional optimizers. The latter may have intractable functional forms. In this contribution, we replace these by appropriate fixed-form distributions yielding the required moments. We address two scenarios of this Restricted VB (RVB) approximation. For the first scenario,

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose.... more

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood- or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios.

We describe a Bayesian estimation and inference procedure for fMRI time series based on the use of General Linear Models (GLMs). Importantly, we use a spatial prior on regression coefficients which embodies our prior knowledge that evoked... more

We describe a Bayesian estimation and inference procedure for fMRI time series based on the use of General Linear Models (GLMs). Importantly, we use a spatial prior on regression coefficients which embodies our prior knowledge that evoked responses are spatially contiguous and locally homogeneous. Further, using a computationally efficient Variational Bayes framework, we are able to let the data determine the optimal amount of smoothing. We assume an arbitrary order Auto-Regressive (AR) model for the errors. Our model generalizes earlier work on voxel-wise estimation of GLM-AR models and inference in GLMs using Posterior Probability Maps (PPMs). Results are shown on simulated data and on data from an event-related fMRI experiment. D 2004 Elsevier Inc. All rights reserved.

In this paper, we present an incremental method for model selection and learning of Gaussian mixtures based on the recently proposed variational Bayes approach. The method adds components to the mixture using a Bayesian splitting test... more

In this paper, we present an incremental method for model selection and learning of Gaussian mixtures based on the recently proposed variational Bayes approach. The method adds components to the mixture using a Bayesian splitting test proce- dure: a component is split into two components and then vari- ational update equations are applied only to the parameters of the two

Functional MRI provides a unique perspective of neuronal organization; however, these data include many complex sources of spatiotemporal variability, which require spatial preprocessing and statistical analysis. For the latter, Bayesian... more

Functional MRI provides a unique perspective of neuronal organization; however, these data include many complex sources of spatiotemporal variability, which require spatial preprocessing and statistical analysis. For the latter, Bayesian models provide a promising alternative to classical inference, which uses results from Gaussian random field theory to assess the significance of spatially correlated statistic images. A Bayesian approach generalizes the application of these ideas in that (1) random fields are used to model all spatial parameters, not solely observation error, (2) their smoothness is optimized, and (3) a broader class of models can be compared. The main problem, however, is computational, due to the large number of voxels in a brain volume. Sampling methods are time-consuming; however, approximate inference using variational Bayes (VB) offers a principled and transparent way to specify assumptions necessary for computational tractability. Penny et al. (2005b) described such a scheme using a joint spatial prior and approximated the joint posterior density with one that factorized over voxels. However, a further computational bottleneck is encountered when evaluating the log model evidence used to compare models. This has lead to dividing a brain volume into slices and treating each independently. This amounts to approximating the spatial prior over a full volume with stacked 2D priors. That is, smoothness along the z-axis is not included in the model. Here we describe a VB scheme that approximates the zero mean joint spatial prior with a non-zero mean empirical prior that factors over voxels, thereby overcoming this problem. We do this by modifying the original VB algorithm of Penny et al. using the conditional form of a so-called conditional autoregressive (CAR) prior to update a marginal prior over voxels. We refer to this as a spatially-informed voxel-wise prior (SVP) and use them to spatially regularise general linear model (GLM) and autoregressive (AR) coefficients (over time to model serial correlations). This algorithm scales more favourably with the number of voxels providing a truly 3D spatiotemporal model over volumes containing tens of thousands of voxels. We compare the scaling of compute times with the number of voxels and performance with a joint prior applied to synthetic and single-subject data. (L.M. Harrison). 1 This is an example of a nested model; however, the Bayesian approach generalizes this to non-nested models, e.g., different parametric designs which cannot be combined in one design matrix.

The logic of uncertainty is not the logic of experience and as well as it is not the logic of chance. It is the logic of experience and chance. Experience and chance are two inseparable poles. These are two dual reections of one essence,... more

The logic of uncertainty is not the logic of experience and as well as it is not the logic of chance. It is the logic of experience and chance. Experience and chance are two inseparable poles. These are two dual reections of one essence, which is called co∼event. The theory of experience and chance is the theory of co∼events. To study the co∼events, it is not enough to study the experience and to study the chance. For this, it is necessary to study the experience and chance as a single entire, a co∼event. In other words, it is necessary to study their interaction within a co∼event. The new co∼event axiomatics and the theory of co∼events following from it were created precisely for these purposes. In this work, I am going to demonstrate the effectiveness of the new theory of co∼events in a studying the logic of uncertainty. I will do this by the example of a co∼event splitting of the logic of the Bayesian scheme, which has a long history of erce debates between Bayesionists and frequentists. I hope the logic of the theory of experience and chance will make its modest contribution to the application of these old dual debaters., theory of experience and chance, co∼event dualism, co∼event axiomatics, logic of uncertainty, logic of experience and chance, logic of cause and consequence, logic of the past and the future, Bayesian scheme.

We propose a new reconstruction procedure for X-ray computed tomography (CT) based on Bayesian modeling. We utilize the knowledge that the human body is composed of only a limited number of materials whose CT values are roughly known in... more

We propose a new reconstruction procedure for X-ray computed tomography (CT) based on Bayesian modeling. We utilize the knowledge that the human body is composed of only a limited number of materials whose CT values are roughly known in advance. Although the exact Bayesian inference of our model is intractable, we propose an efficient algorithm based on the variational Bayes technique. Experiments show that the proposed method performs better than the existing methods in severe situations where samples are limited or metal is inserted into the body.

This paper presents a variational treatment of dynamic models that furnishes time-dependent conditional densities on the path or trajectory of a system's states and the time-independent densities of its parameters. These are obtained by... more

This paper presents a variational treatment of dynamic models that furnishes time-dependent conditional densities on the path or trajectory of a system's states and the time-independent densities of its parameters. These are obtained by maximising a variational action with respect to conditional densities, under a fixed-form assumption about their form. The action or path-integral of free-energy represents a lower bound on the model's log-evidence or marginal likelihood required for model selection and averaging. This approach rests on formulating the optimisation dynamically, in generalised coordinates of motion. The resulting scheme can be used for online Bayesian inversion of nonlinear dynamic causal models and is shown to outperform existing approaches, such as Kalman and particle filtering. Furthermore, it provides for dual and triple inferences on a system's states, parameters and hyperparameters using exactly the same principles. We refer to this approach as dynamic expectation maximisation (DEM).

We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were... more

We report on work on speaker diarization of telephone conversations which was begun at the Robust Speaker Recognition Workshop held at Johns Hopkins University in 2008. Three diarization systems were developed and experiments were conducted using the summed-channel telephone data from the 2008 NIST speaker recognition evaluation. The systems are a Baseline agglomerative clustering system, a Streaming system which uses speaker factors for speaker change point detection and traditional methods for speaker clustering, and a Variational Bayes system designed to exploit a large number of speaker factors as in state of the art speaker recognition systems. The Variational Bayes system proved to be the most effective, achieving a diarization error rate of 1.0% on the summed-channel data. This represents an 85% reduction in errors compared with the Baseline agglomerative clustering system. An interesting aspect of the Variational Bayes approach is that it implicitly performs speaker clustering in a way which avoids making premature hard decisions. This type of soft speaker clustering can be incorporated into other diarization systems (although causality has to be sacrificed in the case of the Streaming system). With this modification, the Baseline system achieved a diarization error rate of 3.5% (a 50% reduction in errors).

Magnetoencephalography (MEG) provides millisecond-scale temporal resolution for noninvasive mapping of human brain functions, but the problem of reconstructing the underlying source currents from the extracranial data has no unique... more

Magnetoencephalography (MEG) provides millisecond-scale temporal resolution for noninvasive mapping of human brain functions, but the problem of reconstructing the underlying source currents from the extracranial data has no unique solution. Several distributed source estimation methods based on different prior assumptions have been suggested for the resolution of this inverse problem. Recently, a hierarchical Bayesian generalization of the traditional minimum norm estimate (MNE) was proposed, in which the variance of distributed current at each cortical location is considered as a random variable and estimated from the data using the variational Bayesian (VB) framework. Here, we introduce an alternative scheme for performing Bayesian inference in the context of this hierarchical model by using Markov chain Monte Carlo (MCMC) strategies. In principle, the MCMC method is capable of numerically representing the true posterior distribution of the currents whereas the VB approach is inherently approximative. We point out some potential problems related to hyperprior selection in the previous work and study some possible solutions. A hyperprior sensitivity analysis is then performed, and the structure of the posterior distribution as revealed by the MCMC method is investigated. We show that the structure of the true posterior is rather complex with multiple modes corresponding to different possible solutions to the source reconstruction problem. We compare the results from the VB algorithm to those obtained from the MCMC simulation under different hyperparameter settings. The difficulties in using a unimodal variational distribution as a proxy for a truly multimodal distribution are also discussed. Simulated MEG data with realistic sensor and source geometries are used in performing the analyses.

Nummenmaa, A., Auranen, T., Hämäläinen, M. S., Jääskeläinen, I. P., Sams, M., Vehtari, A., and Lampinen, J. (2007). Automatic relevance determination based hierarchical Bayesian MEG inversion in practice. NeuroImage, 37 (3): 876-889.

Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system... more

Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and eff...

In this paper, we discuss how image segmentation can be handled by using Bayesian learning and inference. In particular variational techniques relying on free energy minimization will be introduced. It will be shown how to embed a spatial... more

In this paper, we discuss how image segmentation can be handled by using Bayesian learning and inference. In particular variational techniques relying on free energy minimization will be introduced. It will be shown how to embed a spatial diffusion process on segmentation labels within the Variational Bayes learning procedure so as to enforce spatial constraints among labels.

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose.... more

In the past years, many authors have considered application of machine learning methodologies to effect robot learning by demonstration. Gaussian mixture regression (GMR) is one of the most successful methodologies used for this purpose. A major limitation of GMR models concerns automatic selection of the proper number of model states, i.e., the number of model component densities. Existing methods, including likelihood-or entropy-based criteria, usually tend to yield noisy model size estimates while imposing heavy computational requirements. Recently, Dirichlet process (infinite) mixture models have emerged in the cornerstone of nonparametric Bayesian statistics as promising candidates for clustering applications where the number of clusters is unknown a priori. Under this motivation, to resolve the aforementioned issues of GMR-based methods for robot learning by demonstration, in this paper we introduce a nonparametric Bayesian formulation for the GMR model, the Dirichlet process GMR model. We derive an efficient variational Bayesian inference algorithm for the proposed model, and we experimentally investigate its efficacy as a robot learning by demonstration methodology, considering a number of demanding robot learning by demonstration scenarios. London, where he is researching aspects of statistical machine learning theory with applications to robotics. His current academic interests include Bayesian statistics, variational inference, stochastic processes and nonparametric methods for computer vision, action recognition and other roboticsrelated applications.

Laboratoire des Signaux et Systmes, (UMR8506 CNRS-SUPELEC-UNIV PARIS SUD 11) 3 rue Joliot-Curie, F-91192 Gif-sur-Yvette cedex, France ... KL(qlp) =< -In~ >q= - rq(x) In q((X)) dx, ... H(q) =< -Inq >q= - Lq(x) Inq(x) dx (5) ...... more

Laboratoire des Signaux et Systmes, (UMR8506 CNRS-SUPELEC-UNIV PARIS SUD 11) 3 rue Joliot-Curie, F-91192 Gif-sur-Yvette cedex, France ... KL(qlp) =< -In~ >q= - rq(x) In q((X)) dx, ... H(q) =< -Inq >q= - Lq(x) Inq(x) dx (5) ... F(q) == -In Z('x) +KL(qlp) == FHelmoltz ...

In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth's crust. We exploit the assumption of sparsity for receiver functions to develop a Bayesian... more

In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth's crust. We exploit the assumption of sparsity for receiver functions to develop a Bayesian deconvolution method as an alternative to the widely used iterative deconvolution. We model samples of a sparse signal as i.i.d. Student-t random variables. Gibbs sampling and variational Bayes techniques are investigated for our specific posterior inference problem. We used those techniques within the expectation-maximization (EM) algorithm to estimate our unknown model parameters. The superiority of the Bayesian deconvolution is demonstrated by the experiments on both simulated and real earthquake data.

In this paper, we propose an efficient variational Bayesian (VB) solver for a robust variant of low-rank subspace clustering (LRSC). VB learning offers automatic model selection without parameter tuning. However, it is typically performed... more

In this paper, we propose an efficient variational Bayesian (VB) solver for a robust variant of low-rank subspace clustering (LRSC). VB learning offers automatic model selection without parameter tuning. However, it is typically performed by local search with update rules derived from conditional conjugacy, and therefore prone to local minima problem. Instead, we use an approximate global solver for LRSC with an element-wise sparse term to make it robust against spiky noise. In experiment, our method (mean update solver for robust LRSC), outperforms the original LRSC, as well as the robust LRSC with the standard VB solver.

In this paper, we propose a novel algorithm dedicated to on-line motion averaging for large scale problems. To this end, we design a filter that continuously approximates the posterior distribution of the estimated transformations. In... more

In this paper, we propose a novel algorithm dedicated to on-line motion averaging for large scale problems. To this end, we design a filter that continuously approximates the posterior distribution of the estimated transformations. In order to deal with large scale problems, we associate a variational Bayesian approach with a relative parametrization of the absolute transformations. Such an association allows our algorithm to simultaneously possess two features that are essential for an algorithm dedicated to large scale online motion averaging: 1) a low computational time, 2) the ability to detect wrong loop closure measurements. We extensively demonstrate on several applications (binocular SLAM, monocular SLAM and video mosaicking) that our approach not only exhibits a low computational time and detects wrong loop closures but also significantly outperforms the state of the art algorithm in terms of RMSE.

Bayesian model selection (BMS) is a powerful method for determining the most likely among a set of competing hypotheses about the mechanisms that generated observed data. BMS has recently found widespread application in neuroimaging,... more

Bayesian model selection (BMS) is a powerful method for determining the most likely among a set of competing hypotheses about the mechanisms that generated observed data. BMS has recently found widespread application in neuroimaging, particularly in the context of dynamic causal modelling (DCM). However, so far, combining BMS results from several subjects has relied on simple (fixed effects) metrics, e.g. the group Bayes factor (GBF), that do not account for group heterogeneity or outliers. In this paper, we compare the GBF with two random effects methods for BMS at the between-subject or group level. These methods provide inference on model-space using a classical and Bayesian perspective respectively. First, a classical (frequentist) approach uses the log model evidence as a subject-specific summary statistic. This enables one to use analysis of variance to test for differences in log-evidences over models, relative to intersubject differences. We then consider the same problem in Bayesian terms and describe a novel hierarchical model, which is optimised to furnish a probability density on the models themselves. This new variational Bayes method rests on treating the model as a random variable and estimating the parameters of a Dirichlet distribution which describes the probabilities for all models considered. These probabilities then define a multinomial distribution over model space, allowing one to compute how likely it is that a specific model generated the data of a randomly chosen subject as well as the exceedance probability of one model being more likely than any other model. Using empirical and synthetic data, we show that optimising a conditional density of the model probabilities, given the log-evidences for each model over subjects, is more informative and appropriate than both the GBF and frequentist tests of the log-evidences. In particular, we found that the hierarchical Bayesian approach is considerably more robust than either of the other approaches in the presence of outliers. We expect that this new random effects method will prove useful for a wide range of group studies, not only in the context of DCM, but also for other modelling endeavours, e.g. comparing different source reconstruction methods for EEG/MEG or selecting among competing computational models of learning and decision-making.

Text line segmentation in unconstrained handwritten documents remains a challenge because handwritten text lines are multi-skewed and not obviously separated. This paper presents a new approach based on the variational Bayes (VB)... more

Text line segmentation in unconstrained handwritten documents remains a challenge because handwritten text lines are multi-skewed and not obviously separated. This paper presents a new approach based on the variational Bayes (VB) framework for text line segmentation. Viewing the document image as a mixture density model, with each text line approximated by a Gaussian component, the VB method can automatically determine the number of components. We extend the VB method such that it can both eliminate and split components and control the orientation of text line lines. Experiments on Chinese handwritten documents demonstrated the effectiveness of the approach.

Anticipatory skin conductance responses [SCRs] are a widely used measure of aversive conditioning in humans. Here, we describe a dynamic causal model [DCM] of how anticipatory, evoked, and spontaneous skin conductance changes are... more

Anticipatory skin conductance responses [SCRs] are a widely used measure of aversive conditioning in humans. Here, we describe a dynamic causal model [DCM] of how anticipatory, evoked, and spontaneous skin conductance changes are generated by sudomotor nerve activity. Inversion of this model, using variational Bayes, provides a means of inferring the most likely sympathetic nerve activity, given observed skin conductance responses. In two fear conditioning experiments, we demonstrate the predictive validity of the DCM by showing it has greater sensitivity to the effects of conditioning, relative to alternative (conventional) response estimates. Furthermore, we establish face validity by showing that trial-by-trial estimates of anticipatory sudomotor activity are better predicted by formal learning models, relative to response estimates from peak-scoring approaches. The model furnishes a potentially powerful approach to characterising SCR that exploits knowledge about how these signals are generated.

This article proposes a Bayesian spatio-temporal model for source reconstruction of M/EEG data. The usual two-level probabilistic model implicit in most distributed source solutions is extended by adding a third level which describes the... more

This article proposes a Bayesian spatio-temporal model for source reconstruction of M/EEG data. The usual two-level probabilistic model implicit in most distributed source solutions is extended by adding a third level which describes the temporal evolution of neuronal current sources using time-domain General Linear Models (GLMs). These comprise a set of temporal basis functions which are used to describe event-related M/EEG responses. This places M/EEG analysis in a statistical framework that is very similar to that used for PET and fMRI. The experimental design can be coded in a design matrix, effects of interest characterized using contrasts and inferences made using posterior probability maps. Importantly, as is the case for singlesubject fMRI analysis, trials are treated as fixed effects and the approach takes into account between-trial variance, allowing valid inferences to be made on single-subject data. The proposed probabilistic model is efficiently inverted by using the Variational Bayes framework under a convenient mean-field approximation (VB-GLM). The new method is tested with biophysically realistic simulated data and the results are compared to those obtained with traditional spatial approaches like the popular Low Resolution Electromagnetic Tomo-grAphy (LORETA) and minimum variance Beamformer. Finally, the VB-GLM approach is used to analyze an EEG data set from a face processing experiment.

In this paper, we focus on the problem of extending a given knowledge base by accurately predicting additional true facts based on the facts included in it. This is an essential problem of knowledge representation systems, since knowledge... more

In this paper, we focus on the problem of extending a given knowledge base by accurately predicting additional true facts based on the facts included in it. This is an essential problem of knowledge representation systems, since knowledge bases typically suffer from incompleteness and lack of ability to reason over their discrete entities and relationships. To achieve our goals, in our work we introduce an inducing space nonparametric Bayesian large-margin inference model, capable of reasoning over relationships between pairs of entities. Previous works addressing the entity relationship inference problem model each entity based on atomic entity vector representations. In contrast, our method exploits word feature vectors to directly obtain high-dimensional nonlinear inducing space representations for entity pairs. This way, we allow for extracting salient latent characteristics and interaction dynamics within entity pairs that can be useful for inferring their relationships. On this basis, our model performs the relations inference task by postulating a set of binary Dirichlet process mixture large-margin classifiers, presented with the derived inducing space representations of the considered entity pairs. Bayesian inference for this inducing space model is performed under the mean-field inference paradigm. This is made possible by leveraging a recently proposed latent variable formulation of regularized large-margin classifiers that facilitates mean-field parameter estimation. We exhibit the superiority of our approach over the state-of-the-art by considering the problem of predicting additional true relations between entities given subsets of the WordNet and FreeBase knowledge bases.

Parameter estimation for model-based clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univari-ate NIG mixtures and multivariate NIG mixtures are... more

Parameter estimation for model-based clustering using a finite mixture of normal inverse Gaussian (NIG) distributions is achieved through variational Bayes approximations. Univari-ate NIG mixtures and multivariate NIG mixtures are considered. The use of variational Bayes approximations here is a substantial departure from the traditional EM approach and allevi-ates some of the associated computational complexities and uncertainties. Our variational algorithm is applied to simulated and real data. The paper concludes with discussion and suggestions for future work.

Abstract: In this paper, we propose a family of non-homogeneous Gauss-Markov fields with Potts region labels model for images to be used in a Bayesian estimation framework, in order to jointly restore and segment images de-graded by a... more

Abstract: In this paper, we propose a family of non-homogeneous Gauss-Markov fields with Potts region labels model for images to be used in a Bayesian estimation framework, in order to jointly restore and segment images de-graded by a known point spread function and additive ...

In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth's crust. We exploit the assumption of sparsity for receiver functions to develop a Bayesian... more

In this paper, we propose a Bayesian methodology for receiver function analysis, a key tool in determining the deep structure of the Earth's crust. We exploit the assumption of sparsity for receiver functions to develop a Bayesian deconvolution method as an alternative to the widely used iterative deconvolution. We model samples of a sparse signal as i.i.d. Student-t random variables. Gibbs sampling and variational Bayes techniques are investigated for our specific posterior inference problem. We used those techniques within the expectation-maximization (EM) algorithm to estimate our unknown model parameters. The superiority of the Bayesian deconvolution is demonstrated by the experiments on both simulated and real earthquake data.

We consider the problem of inferring and modeling topics in a sequence of documents with known publication dates. The documents at a given time are each characterized by a topic, and the topics are drawn from a mixture model. The proposed... more

We consider the problem of inferring and modeling topics in a sequence of documents with known publication dates. The documents at a given time are each characterized by a topic, and the topics are drawn from a mixture model. The proposed model infers the change in the topic mixture weights as a function of time. The details of this general framework may take different forms, depending on the specifics of the model. For the examples considered here we examine base measures based on independent multinomial-Dirichlet measures for representation of topic-dependent word counts. The form of the hierarchical model allows efficient variational Bayesian (VB) inference, of interest for large-scale problems. We demonstrate results and make comparisons to the model when the dynamic character is removed, and also compare to latent Dirichlet allocation (LDA) and topics over time (TOT). We consider a database of NIPS papers as well as the United States presidential State of the Union addresses from 1790 to 2008.