On Bayesian analysis of a finite generalized Dirichlet mixture via a Metropolis-within-Gibbs sampling (original) (raw)
Related papers
A Bayesian non-Gaussian mixture analysis: Application to eye modeling
2007 IEEE Conference on …, 2007
Gaussian mixture models. Finite mixture model using generalized Dirichlet distribution has been shown as a robust alternative of normal mixtures. In this paper, we adopt a Bayesian approach for generalized Dirichlet mixture estimation and selection. This approach, offers a solid theoretical framework for combining both the statistical model learning and the knowledge acquisition. The estimation of the parameters is based on the Monte Carlo simulation technique of Gibbs sampling mixed with a Metropolis-Hastings step. For the selection of the number of clusters, we used Bayes factors. We have successfully applied the proposed Bayesian framework to model IR eyes. Experimental results are shown to demonstrate the robustness, efficiency, and accuracy of the algorithm.
Maximum likelihood estimation of the generalized dirichlet mixture
The Dirichlet distribution offers high flexibility for modeling data. However, it has certain characteristics which present a handicap in practical terms. This paper describes a generalization of the Dirichlet distribution to overcome this handicap which we call the GDD (Generalized Dirichlet Distribution). We propose a method for estimating the parameters of a GDD mixture.
Applied Intelligence, 2015
We developed a variational Bayesian learning framework for the infinite generalized Dirichlet mixture model (i.e. a weighted mixture of Dirichlet process priors based on the generalized inverted Dirichlet distribution) that has proven its capability to model complex multidimensional data. We also integrate a "feature selection" approach to highlight the features that are most informative in order to construct an appropriate model in terms of clustering accuracy. Experiments on synthetic data as well as real data generated from visual scenes and handwritten digits datasets illustrate and validate the proposed approach.
IEEE Transactions on Image Processing, 2004
This paper presents an unsupervised algorithm for learning a finite mixture model from multivariate data. This mixture model is based on the Dirichlet distribution, which offers high flexibility for modeling data. The proposed approach for estimating the parameters of a Dirichlet mixture is based on the maximum likelihood (ML) and Fisher scoring methods. Experimental results are presented for the following applications: estimation of artificial histograms, summarization of image databases for efficient retrieval, and human skin color modeling and its application to skin detection in multimedia databases.
Pattern Recognition, 2013
This paper introduces a novel enhancement for unsupervised feature selection based on generalized Dirichlet (GD) mixture models. Our proposal is based on the extension of the finite mixture model previously developed in [1] to the infinite case, via the consideration of Dirichlet process mixtures, which can be viewed actually as a purely nonparametric model since the number of mixture components can increase as data are introduced. The infinite assumption is used to avoid problems related to model selection (i.e. determination of the number of clusters) and allows simultaneous separation of data in to similar clusters and selection of relevant features. Our resulting model is learned within a principled variational Bayesian framework that we have developed. The experimental results reported for both synthetic data and real-world challenging applications involving image categorization, automatic semantic annotation and retrieval show the ability of our approach to provide accurate models by distinguishing between relevant and irrelevant features without over-or under-fitting the data.
Gibbs Sampling Based Bayesian Analysis of Mixtures with Unknown Number of Components
For mixture models with unknown number of components, Bayesian approaches, as considered by and , are reconciled here through a simple Gibbs sampling approach. Specifically, we consider exactly the same direct set up as used by , but put Dirichlet process prior on the mixture components; the latter has also been used by albeit in a different set up. The reconciliation we propose here yields a simple Gibbs sampling scheme for learning about all the unknowns, including the unknown number of components. Thus, we completely avoid complicated reversible jump Markov chain Monte Carlo (RJMCMC) methods, yet tackle variable dimensionality simply and efficiently. Moreover, we demonstrate, using both simulated and real data sets, and pseudo-Bayes factors, that our proposed model outperforms that of , while enjoying, at the same time, computational superiority over the methods proposed by and . We also discuss issues related to clustering and argue that in principle, our approach is capable of learning about the number of clusters in the sample as well as in the population, while the approach of is suitable for learning about the number of clusters in the sample only.
A sequential algorithm for fast fitting of Dirichlet process mixture models
2013
In this article we propose an improvement on the sequential updating and greedy search (SUGS) algorithm for fast fitting of Dirichlet process mixture models. The SUGS algorithm provides a means for very fast approximate Bayesian inference for mixture data which is particularly of use when data sets are so large that many standard Markov chain Monte Carlo (MCMC) algorithms cannot be applied efficiently, or take a prohibitively long time to converge. In particular, these ideas are used to initially interrogate the data, and to refine models such that one can potentially apply exact data analysis later on. SUGS relies upon sequentially allocating data to clusters and proceeding with an update of the posterior on the subsequent allocations and parameters which assumes this allocation is correct. Our modification softens this approach, by providing a probability distribution over allocations, with a similar computational cost; this approach has an interpretation as a variational Bayes procedure and hence we term it variational SUGS (VSUGS). It is shown in simulated examples that VSUGS can out-perform, in terms of density estimation and classification, the original SUGS algorithm in many scenarios.
International Journal of Semantic Computing
Clustering as an exploratory technique has been a promising approach for performing data analysis. In this paper, we propose a non-parametric Bayesian inference to address clustering problem. This approach is based on infinite multivariate Beta mixture models constructed through the framework of Dirichlet process. We apply an accelerated variational method to learn the model. The motivation behind proposing this technique is that Dirichlet process mixture models are capable to fit the data where the number of components is unknown. For large-scale data, this approach is computationally expensive. We overcome this problem with the help of accelerated Dirichlet process mixture models. Moreover, the truncation is managed using kd-trees. The performance of the model is validated on real medical applications and compared to three other similar alternatives. The results show the outperformance of our proposed framework.
Asian Conference on Machine Learning, 2012
Online algorithms allow data instances to be processed in a sequential way, which is important for large-scale and real-time applications. In this paper, we propose a novel online clustering approach based on a Dirichlet process mixture of generalized Dirichlet (GD) distributions, which can be considered as an extension of the finite GD mixture model to the infinite case. Our approach is built on nonparametric Bayesian analysis where the determination of the number of clusters is sidestepped by assuming an infinite number of mixture components. Moreover, an unsupervised localized feature selection scheme is integrated with the proposed nonparametric framework to improve the clustering performance. By learning the proposed model in an online manner using a variational approach, all the involved parameters and features saliencies are estimated simultaneously and effectively in closed forms. The proposed online infinite mixture model is validated through both synthetic data sets and two challenging real-world applications namely text document clustering and online human face detection.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000
We consider the problem of determining the structure of high-dimensional data without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. This makes the generalized Dirichlet distribution more practical and useful. An important problem in mixture modeling is the determination of the number of clusters. Indeed, a mixture with too many or too few components may not be appropriate to approximate the true model. Here, we consider the application of the minimum message length (MML) principle to determine the number of clusters. The MML is derived so as to choose the number of clusters in the mixture model that best describes the data. A comparison with other selection criteria is performed. The validation involves synthetic data, real data clustering, and two interesting real applications: classification of Web pages, and texture database summarization for efficient retrieval.