Analysis of Sparse Bayesian Learning (original) (raw)

Sparse Bayesian modeling with adaptive kernel learning

IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council, 2009

Sparse kernel methods are very efficient in solving regression and classification problems. The sparsity and performance of these methods depend on selecting an appropriate kernel function, which is typically achieved using a cross-validation procedure. In this paper, we propose an incremental method for supervised learning, which is similar to the relevance vector machine (RVM) but also learns the parameters of the kernels during model training. Specifically, we learn different parameter values for each kernel, resulting in a very flexible model. In order to avoid overfitting, we use a sparsity enforcing prior that controls the effective number of parameters of the model. We present experimental results on artificial data to demonstrate the advantages of the proposed method and we provide a comparison with the typical RVM on several commonly used regression and classification data sets.

Bayesian Learning of Sparse Classifiers

2001

Bayesian approaches to supervised learning use priors on the classifier parameters. However, few priors aim at achieving "sparse" classifiers, where irrelevant/redundant parameters are automatically set to zero. Two well-known ways of obtaining sparse classifiers are: use a zero-mean Laplacian prior on the parameters, and the "support vector machine" (SVM). Whether one uses a Laplacian prior or an SVM, one still needs to specify/estimate the parameters that control the degree of sparseness of the resulting classifiers.

Sparse Bayesian learning and the relevance multi-layer perceptron network

2008

Abstract We introduce a simple framework for sparse Bayesian learning with multi-layer perceptron (IMLP) networks, inspired by Tipping's relevance vector machine (RVM). Like the RVM, a Bayesian prior is adopted that includes separate hyperparameters for each weight, allowing redundant weights and hidden layer units to be identified and subsequently pruned from the network, whilst also providing a means to avoid over-fitting the training data.

A prior for consistent estimation for the relevance vector machine

2004

The Relevance Vector Machine (RVM) provides an empirical Bayes treatment of function approximation by kernel basis expansion. In its original form ?, RVM achieves a sparse representation of the approximating function by structuring a Gaussian prior distribution in a way that implicitly puts a sparsity pressure on the coefficients appearing in the expansion. RVM aims at retaining the tractability of the Gaussian prior while simultaneously achieving the assumed (and desired) sparse representation. This is achieved by specifying independent Gaussian priors for each of the coefficients. In his introductory paper, ? shows that for such a prior structure, the use of independent Gamma hyperpriors yields a product of independent Student-t marginal prior for the coefficients, thereby achieving the desired sparsity. However, such a prior structure gives complete freedom to the coefficients, making it impossible to isolate a unique solution to the function estimation task.

Sparse Kernel Learning and the Relevance Units Machine

Lecture Notes in Computer Science, 2009

The relevance vector machine(RVM) is a state-of-the-art constructing sparse regression kernel model . It not only generates a much sparser model but provides better generalization performance than the standard support vector machine (SVM). In RVM and SVM, relevance vectors (RVs) and support vectors (SVs) are both selected from the input vector set. This may limit model flexibility. In this paper we propose a new sparse kernel model called Relevance Units Machine (RUM). RUM follows the idea of RVM under the Bayesian framework but releases the constraint that RVs have to be selected from the input vectors. RUM treats relevance units as part of the parameters of the model. As a result, a RUM maintains all the advantages of RVM and offers superior sparsity. The new algorithm is demonstrated to possess considerable computational advantages over well-known the state-of-the-art algorithms.

Fast marginal likelihood maximisation for sparse Bayesian models

Proceedings of the ninth international workshop …, 2003

The 'sparse Bayesian' modelling approach, as exemplified by the 'relevance vector machine', enables sparse classification and regression functions to be obtained by linearly-weighting a small number of fixed basis functions from a large dictionary of potential candidates. Such a model conveys a number of advantages over the related and very popular 'support vector machine', but the necessary 'training' procedure-optimisation of the marginal likelihood function-is typically much slower. We describe a new and highly accelerated algorithm which exploits recently-elucidated properties of the marginal likelihood function to enable maximisation via a principled and efficient sequential addition and deletion of candidate basis functions.

Performance Evaluation of Latent Variable Models with Sparse Priors

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007

A variety of Bayesian methods have recently been introduced for finding sparse representations from overcomplete dictionaries of candidate features. These methods often capitalize on latent structure inherent in sparse distributions to perform standard MAP estimation, variational Bayes, approximation using convex duality, or evidence maximization. Despite their reliance on sparsity-inducing priors however, these approaches may or may not actually lead to sparse representations in practice, and so it is a challenging task to determine which algorithm and sparse prior is appropriate. Rather than justifying prior selections and modelling assumptions based on the credibility of the full Bayesian model as is commonly done, this paper bases evaluations on the actual cost functions that emerge from each method. Two minimal conditions are postulated that ideally any sparse learning objective should satisfy. Out of all possible cost functions that can be obtained from the methods described above using (virtually) any sparse prior, a unique function is derived that satisfies these conditions. Both sparse Bayesian learning (SBL) and basis pursuit (BP) are special cases. Later, all methods are shown to be performing MAP estimation using potentially non-factorable implicit priors, which suggests new sparse learning cost functions. Index Terms-sparse representations, sparse priors, latent variable models, underdetermined inverse problems, Bayesian learning

A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors

Variable selection techniques have become increasingly popular amongst statisticians due to an increased number of regression and classification applications involving high-dimensional data where we expect some predictors to be unimportant. In this context, Bayesian variable selection techniques involving Markov chain Monte Carlo exploration of the posterior distribution over models can be prohibitively computationally expensive and so there has been attention paid to quasi-Bayesian approaches such as maximum a poste-riori (MAP) estimation using priors that induce sparsity in such estimates. We focus on this latter approach, expanding on the hierarchies proposed to date to provide a Bayesian interpretation and generalization of state-of-the-art penalized optimization approaches and providing simultaneously a natural way to include prior information about parameters within this framework. We give examples of how to use this hierarchy to compute MAP estimates for linear and logistic regression as well as sparse precision-matrix estimates in Gaussian graphical models. In addition, an adaptive group lasso method is derived using the framework .

An Invariant Bayesian Model Selection Principle for Gaussian Data in a Sparse Representation

IEEE Transactions on Information Theory, 2000

We develop a code length principle which is invariant to the choice of parameterization on the model distributions. An invariant approximation formula for easy computation of the marginal distribution is provided for gaussian likelihood models. We provide invariant estimators of the model parameters and formulate conditions under which these estimators are essentially posteriori unbiased for gaussian models. An upper bound on the coarseness of discretization on the model parameters is deduced. We introduce a discrimination measure between probability distributions and use it to construct probability distributions on model classes. The total code length is shown to be closely related to the NML code length of Rissanen when choosing Jeffreys prior distribution on the model parameters together with a uniform prior distribution on the model classes. Our model selection principle is applied to a gaussian estimation problem for data in a wavelet representation and its performance is tested and compared to alternative wavelet-based estimation methods in numerical experiments.

Sparse Bayesian learning using variational Bayes inference based on a greedy criterion

2017 51st Asilomar Conference on Signals, Systems, and Computers, 2017

Compressive sensing (CS) is an evolving area in signal acquisition and reconstruction with many applications [1-3]. In CS the goal is to efficiently measure and then reconstruct the signal under the assumption that such signal is sparse but the number and location of nonzeros are unknown. A linear CS problem is modeled as y=Ax s +e, where y ∈ ℝ M contains measurements, x s ∈ ℝ N is the sparse solution, and e is the noise with M ≪ N [4-6]. A = ΦΨ, where Φ is the sensing matrix and Ψ is a proper basis in which x s is sparse. There are three approaches to solve for x s i.e, greedy-based, convex-based, and sparse Bayesian learning (SBL) algorithms. Here, we consider the SBL approach. Specifically, we consider Gaussian-Bernoulli prior to promote sparsity in the solution and then use variational Bayes (VB) inference to estimate the variables and parameters of the model. In the Gaussian-Bernoulli model, the sparse solution is defined as x s =(s∘x), where s is a binary support learning vector, x accounts for the values of the solution, and "∘" is the element-wise product [7, 8]. It turns out that using VB inference for CS problem has the over fitting issue mainly when the number of measurements are low. For example, for the CluSS-VB algorithm, Yu et al. [8] pointed out that the solution may tend to become non-sparse. In this work, we propose a VB-based SBL algorithm which uses a simple criterion to remove such effect and forces the solution to become sparse. We also discuss and compare the update rules obtained from the SBL using fully hierarchical Bayesian approach via Markov chain Monte Carlo (MCMC) [7], expectation-maximization (EM) algorithm, and the VB inference. As expected, there exist a very close relationship between all these algorithms and we provide some intuition on how to turn equations of one approach to another approach. Also, we will provide some simulation results to compare the performance of such algorithms for the CS problems. G-OSBL (VB): An SBL algorithm using VB Suppose there is a model with parameters Θ, hidden variables collected in x, and a set of observations denoted by y. Then, the approximation to the joint density p(x,Θ|y) can be represented by p(x,Θ|y) ≈ q x (x)q θ (Θ). Then, the lower bound on the model log marginal likelihood can be iteratively optimized by the following updates [9]