Bayesian analysis of finite mixtures of multinomial and negative-multinomial distributions (original) (raw)
Related papers
Approximate Bayesian computation for finite mixture models
Journal of Statistical Computation and Simulation, 2020
Finite mixture models are used in statistics and other disciplines, but inference for mixture models is challenging. The multimodality of the likelihood function and the so called label switching problem contribute to the challenge. We propose extensions of the Approximate Bayesian Computation Population Monte-Carlo (ABC-PMC) algorithm as an alternative framework for inference on finite mixture models. There are several decisions to make when implementing an ABC-PMC algorithm for finite mixture models, including the selection of the kernel used for moving the particles through the iterations, how to address the label switching problem and the choice of informative summary statistics. Examples are presented to demonstrate the performance of the proposed ABC-PMC algorithm for mixture modeling.
Bayesian analysis of finite Gaussian mixtures
2010 IEEE International Conference on Acoustics, Speech and Signal Processing, 2010
The problem considered in this paper is parameter estimation of a multivariate Gaussian mixture distribution with a known number of components. The paper presents a new Bayesian method which sequentially processes the observed data points by forming candidate sequences of labels assigning data points to mixture components. Using conjugate priors, we derive analytically a recursive formula for the computation of the probability of each label sequence. The practical implementation of this algorithm keeps only a prede ned number of the highest ranked label sequences with the ranking based on posterior probabilities. We show by numerical simulations that the proposed technique consistently outperforms both the k-means and the EM algorithm.
PloS one, 2015
The label switching problem occurs as a result of the nonidentifiability of posterior distribution over various permutations of component labels when using Bayesian approach to estimate parameters in mixture models. In the cases where the number of components is fixed and known, we propose a relabelling algorithm, an allocation variable-based (denoted by AVP) probabilistic relabelling approach, to deal with label switching problem. We establish a model for the posterior distribution of allocation variables with label switching phenomenon. The AVP algorithm stochastically relabel the posterior samples according to the posterior probabilities of the established model. Some existing deterministic and other probabilistic algorithms are compared with AVP algorithm in simulation studies, and the success of the proposed approach is demonstrated in simulation studies and a real dataset.
Bayesian analysis of finite mixture models of distributions from exponential families
Computational Statistics, 2006
This paper deals with the Bayesian analysis of finite mixture models with a fixed number of component distributions from natural exponential families with quadratic variance function (NEF-QVF). A unified Bayesian framework addressing the two main difficulties in this context is presented, i.e., the prior distribution choice and the parameter unidentifiability problem. In order to deal with the first issue, conjugate prior distributions are used. An algorithm to calculate the parameters in the prior distribution to obtain the least informative one into the class of conjugate distributions is developed. Regarding the second issue, a general algorithm to solve the label-switching problem is presented. These techniques are easily applied in practice as it is shown with an illustrative example.
Gibbs Sampling Based Bayesian Analysis of Mixtures with Unknown Number of Components
For mixture models with unknown number of components, Bayesian approaches, as considered by and , are reconciled here through a simple Gibbs sampling approach. Specifically, we consider exactly the same direct set up as used by , but put Dirichlet process prior on the mixture components; the latter has also been used by albeit in a different set up. The reconciliation we propose here yields a simple Gibbs sampling scheme for learning about all the unknowns, including the unknown number of components. Thus, we completely avoid complicated reversible jump Markov chain Monte Carlo (RJMCMC) methods, yet tackle variable dimensionality simply and efficiently. Moreover, we demonstrate, using both simulated and real data sets, and pseudo-Bayes factors, that our proposed model outperforms that of , while enjoying, at the same time, computational superiority over the methods proposed by and . We also discuss issues related to clustering and argue that in principle, our approach is capable of learning about the number of clusters in the sample as well as in the population, while the approach of is suitable for learning about the number of clusters in the sample only.
Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models
Statistics and Computing, 2010
The label switching problem is caused by the likelihood of a Bayesian mixture model being invariant to permutations of the labels. The permutation can change multiple times between Markov Chain Monte Carlo (MCMC) iterations making it difficult to infer component-specific parameters of the model. Various so-called 'relabelling' strategies exist with the goal to 'undo' the label switches that have occurred to enable estimation of functions that depend on component-specific parameters. Most existing approaches rely upon specifying a loss function, and relabelling by minimising its posterior expected loss. In this paper we develop probabilistic approaches to relabelling that allow estimation and incorporation of the uncertainty in the relabelling process. Variants of the probabilistic relabelling algorithm are introduced and compared to existing loss function based methods. We demonstrate that the idea of probabilistic relabelling can be expressed in a rigorous framework based on the EM algorithm.
A practical sampling approach for a Bayesian mixture model with unknown number of components
Statistical Papers, 2007
Recently, mixture distribution becomes more and more popular in many scientific fields. Statistical computation and analysis of mixture models, however, are extremely complex due to the large number of parameters involved. Both EM algorithms for likelihood inference and MCMC procedures for Bayesian analysis have various difficulties in dealing with mixtures with unknown number of components. In this paper, we propose a direct sampling approach to the computation of Bayesian finite mixture models with varying number of components. This approach requires only the knowledge of the density function up to a multiplicative constant. It is easy to implement, numerically efficient and very practical in real applications. A simulation study shows that it performs quite satisfactorily on relatively high dimensional distributions. A well-known genetic data set is used to demonstrate the simplicity of this method and its power for the computation of high dimensional Bayesian mixture models.
Bayesian analysis for mixtures of discrete distributions with a non-parametric component
Journal of Applied Statistics, 2015
Bayesian finite mixture modelling is a flexible parametric modelling approach for classification and density fitting. Many application areas require distinguishing a signal from a noise component. In practice, it is often difficult to justify a specific distribution for the signal component, therefore the signal distribution is usually further modelled via a mixture of distributions. However, modelling the signal as a mixture of distributions is computationally challenging due to the difficulties in justifying the exact number of components to be used and due to the label switching problem. This paper proposes the use of a non-parametric distribution to model the signal component. We consider the case of discrete data and show how this new methodology leads to more accurate parameter estimation and smaller classification error. Moreover, it does not incur the label switching problem. We show an application of the method to data generated by ChIP-sequencing experiments.
Combinational Mixtures of Multiparameter Distributions
2009
We introduce combinatorial mixtures-a flexible class of models for inference on mixture distributions whose component have multidimensional parameters. The key idea is to allow each element of the component-specific parameter vectors to be shared by a subset of other components. This approach allows for mixtures that range from very flexible to very parsimonious, and unifies inference on component-specific parameters with inference on the number of components. We develop Bayesian inference and computation approaches for this class of distributions, and illustrate them in an application. This work was originally motivated by the analysis of cancer subtypes: in terms of biological measures of interest, subtypes may characterized by differences in location, scale, correlations or any of the combinations. We illustrate our approach using data on molecular subtypes of lung cancer.
Relabelling in Bayesian mixture models by pivotal units
In this paper a simple procedure to deal with label switching when exploring complex posterior distributions by MCMC algorithms is proposed. Although it cannot be generalized to any situation, it may be handy in many applications because of its simplicity and very low computational burden. A possible area where it proves to be useful is when deriving a sample for the posterior distribution arising from finite mixture models when no simple or rational ordering between the components is available.