Bayesian identifiability and misclassification in multinomial data (original) (raw)

Misclassified multinomial data: a Bayesian approach

2000

In this paper, the problem of inference with misclassified mu ltinomial data is addressed. Over the last years there has been a significant upsurge of interes t in the development of Bayesian methods to make inferences with misclassified data. The wide range of ap plications for several sampling schemes and the importance of including initial information make Bayesian analysis an

A Bayesian model for multinomial sampling with misclassified data

Journal of Applied Statistics, 2008

In this paper the issue of making inferences with misclassified data from a noisy multinomial process is addressed. A Bayesian model for making inferences about the proportions and the noise parameters is developed. The problem is reformulated in a more tractable form by introducing auxiliary or latent random vectors. This allows for an easy-toimplement Gibbs sampling-based algorithm to generate samples from the distributions of interest. An illustrative example related to elections is also presented.

Inference for misclassified multinomial data with covariates

Canadian Journal of Statistics, 2020

This article considers multinomial data subject to misclassification in the presence of covariates which affect both the misclassification probabilities and the true classification probabilities. A subset of the data may be subject to a secondary measurement according to an infallible classifier. Computations are carried out in a Bayesian setting where it is seen that the prior has an important role in driving the inference. In addition, a new and less problematic definition of nonidentifiability is introduced and is referred to as hierarchical nonidentifiability.

Bayesian Identifiability: Contributions to an Inconclusive Debate.

San Martin, E., & González, J. (2010). Bayesian Identifiability: Contributions to an Inconclusive Debate. Chilean Journal of Statistics, 1(2), 69-91

Using the concept of reduction by sufficiency of a Bayesian model, the issue of Bayesian identifiability is discussed. Various statements given in the literature on Bayesian identifiability are revised. Particular attention is put on the possibility of updating unidentified parameters. This issue is discussed under a general framework and also carefully illustrated in a fully discrete Bayesian model.

A New Method for Estimating Model Parameters for Multinomial Data

Journal of Mathematical Psychology, 1998

A new procedure for estimating the parameters of a scientific model is described, and the method is applied and illustrated for the class of experiments with multinominal data structure. The procedure is referred to as the method of population-parameter mapping, and it has a number of novel and advantageous features. The method is a variation of a standard Bayesian analysis. However, instead of directly developing a posterior distribution for the model parameters, this procedure first characterizes the population proportions for the multinomial cells. Random samples are then drawn from the posterior distribution for these proportions, and these samples are mapped to the parameters of the scientific model. This method leads naturally to a definition of model identifiability, and leads to a direct probability estimate of the coherence of the scientific model. Moreover, the new procedure can circumvent the problem of dealing with computationally difficult integrals that frequently occur with Bayesian analyses of complex multinomial models. The method is illustrated by means of several memory measurement models as well as a signal-detection model.

Eliciting Dirichlet and Connor–Mosimann prior distributions for multinomial models

TEST, 2013

This paper addresses the task of eliciting an informative prior distribution for multinomial models. We first introduce a method of eliciting univariate beta distributions for the probability of each category, conditional on the probabilities of other categories. Two different forms of multivariate prior are derived from the elicited beta distributions. First, we determine the hyperparameters of a Dirichlet distribution by reconciling the assessed parameters of the univariate beta conditional distributions. Although the Dirichlet distribution is the standard conjugate prior distribution for multinomial models, it is not flexible enough to represent a broad range of prior information. Second, we use the beta distributions to determine the parameters of a Connor-Mosimann distribution, which is a generalization of a Dirichlet distribution and is also a conjugate prior for multinomial models. It has a larger number of parameters than the standard Dirichlet distribution and hence a more flexible structure. The elicitation methods are designed to be used with the aid of interactive graphical user-friendly software.

Bayesian Identification and Partial Identification

2011

The problem of identification and partial identification in econometrics is considered. We carry out a comprehensive analysis of the identification issue from both a classical and a Bayesian point of view. We review three concepts of identification: sampling, measurable and Bayesian identification, and the relationship existing among them. We analyze the concept of exact estimability and its link with Bayesian identification. Examples are provided where measurable identification fails while Bayesian identification does not. Many classical examples of the partial identification literature are reconsidered by using a nonparametric Bayesian approach.

Nonparametric Bayes Modeling of Multivariate Categorical Data

Journal of The American Statistical Association, 2009

Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.

Revisiting identification concepts in Bayesian analysis

2021

This paper studies the role played by identification in the Bayesian analysis of statistical and econometric models. First, for unidentified models we demonstrate that there are situations where the introduction of a non-degenerate prior distribution can make a parameter that is nonidentified in frequentist theory identified in Bayesian theory. In other situations, it is preferable to work with the unidentified model and construct a Markov Chain Monte Carlo (MCMC) algorithms for it instead of introducing identifying assumptions. Second, for partially identified models we demonstrate how to construct the prior and posterior distributions for the identified set parameter and how to conduct Bayesian analysis. Finally, for models that contain some parameters that are identified and others that are not we show that marginalizing out the identified parameter from the likelihood with respect to its conditional prior, given the nonidentified parameter, allows the data to be informative abou...

Bayesian Analysis of Binary Data Subject to Misclassification

This paper considers estimation of success probabilities of categorical binary data subject to misclassi cation errors from the Bayesian point of view. It has been shown by that sample proportions are in general biased estimates. This bias is a function of the amount of misclassi cation and can be substantial. proposed to eliminate the bias by subjecting a portion of the sample to both true and fallible classi ers, resulting in a 2 x 2 table, from which the misclassication rates can be estimated. The rationale is that fallible classi ers are inexpensive relative to infallible ones. Hence if only a part of the sample is measured by the infallible classi er one can obtain a more e cient estimate, for a given sampling budget, than by measuring the whole sample using the infallible classi er.