Estimating the structural distribution function of cell probabilities (original) (raw)

Risk-Efficient Sequential Estimation of the Number of Multinomial Cells

Communications in Statistics-theory and Methods, 2004

We consider risk-efficient sequential estimation under squared error loss of the number of classes which are equally probable to occur in a given multinomial population. It is assumed that the sampling cost per observation is constant. Large-sample properties of the sequential estimator are studied. Finally, Monte Carlo simulation is carried out in order to investigate its finite sample behavior. The proposed sequential procedure performs better (in the sense of reducing average stopping time and risk) than the one based on the estimator K n , the number of distinct cells observed in a sample of size n.

Inference for the maximum cell probability under multinomial sampling

Naval Research Logistics, 1992

This article investigates inference forp,,,, the largest cell probability in multinomial trials for the case of a small to moderate number of trials. Emphasis focuses on point and interval estimation. Both frequentist and Bayesian approaches are developed. The results of extensive simulation investigation are included as well as the analysis of a set of crime data for the city of New Orleans taken from the National Crime Survey.

Estimating a Structural Distribution Function by Grouping

arXiv (Cornell University), 2008

By the method of Poissonization we confirm some existing results concerning consistent estimation of the structural distribution function in the situation of a large number of rare events. Inconsistency of the so called natural estimator is proved. The method of grouping in cells of equal size is investigated and its consistency derived. A bound on the mean squared error is derived.

Consistent Estimation of the Structural Distribution Function

Scandinavian Journal of Statistics, 2000

Motivated by problems in linguistics we consider a multinomial random vector for which the number of cells N is not much smaller than the sum of the cell frequencies, i.e. the sample size n. The distribution function of the uniform distribution on the set of all cell probabilities multiplied by N is called the structural distribution function of the cell probabilities. Conditions are given that guarantee that the structural distribution function can be estimated consistently as n increases inde®nitely although naN does not. The natural estimator is inconsistent and we prove consistency of essentially two alternative estimators.

A Consistent Estimator of Structural Distribution

Austrian Journal of Statistics, 2020

We consider sparse count data models with the sparsity rate ? = N/n = O(1) where N = N (n) is the number of observations and n ? ? is the number of cells. In this case the plug-in estimator of the structural distribution of expected frequencies is inconsistent. If ? = O(n ?? ) for some ? > 0, the nonparametric maximum likelihood estimator, in general, is also inconsistent. Assuming that some auxiliary information on the expected frequencies is available, we construct a consistent estimator of the structural distribution.

On Dirichlet Multinomial Distributions

Random Walk, Sequential Analysis and Related Topics - A Festschrift in Honor of Yuan-Shih Chow, 2006

Let Y have a symmetric Dirichlet multinomial distributions in R m , and let S m = h(Y 1)+• • •+h(Y m). We derive a central limit theorem for S m as the sample size n and the number of cells m tend to infinity at the same rate. The rate of convergence is shown to be of order m 1/6. The approach is based on approximation of marginal distributions for the Dirichlet multinomial distribution by negative binomial distributions, and a blocking technique similar to that used to study renormalization groups in statistical physics. These theorems generalize and refine results for the classical occupancy problem. keyword: Occupancy problems; central limit theorem; exchangeable distributions 1 Introduction and main results. Let Y have a multinomial distribution M(n, p) with n trials and success probabilities p = (p 1 ,. .. , p m). Classical occupancy problems concern counts k = #{j : Y j = k} and coverage m − 0. If m and n tend to infinity at the same rate and the multinomial distribution is symmetric, p 1 = • • • = p m = 1/m, Weiss (1958) gives a central limit theorem for the number of cells covered or 0. This result has been extended in various directions. Rényi (1962) gives proofs extending Weiss' result in more general limits, Kopocinska and Kopocinski (1992) prove joint asymptotic normality for a collection of the k , and Englund (1981) gives a Berry-Esseen bound for the error of normal approximation. In asymmetric cases where the cell probabilities p j vary, Esty (1983) gives a central limit theorem for the coverage and Quine and Robinson (1982) obtain a Berry-Esseen bound. Most relevant to the results presented here, Chen (1980) introduces mixture models, described below, in which the multinomial cell probabilities are sampled from a Dirichlet distribution. Extensions are presented in Chen (1981a, 1981b).

A maximum likelihood estimator for parameter distributions in heterogeneous cell populations

Procedia Computer Science, 2010

In many biologically relevant situations, cells of a clonal population show a heterogeneous response upon a common stimulus. The computational analysis of such situations requires the study of cell-cell variability and modeling of heterogeneous cell populations. In this work, we consider populations where the behavior of every single cell can be described by a system of ordinary differential equations. Heterogeneity among individual cells is modeled via differences in parameter values and initial conditions. Both are subject to a distribution function which is part of the cell population model.

Poisson loglinear modeling with linear constraints on the expected cell frequencies

Sankhya B, The Indian Journal of Statististics, 2012

In this paper we consider Poisson loglinear models with linear constraints (LMLC) on the expected table counts. Multinomial and product multinomial loglinear models can be obtained by considering that some marginal totals (linear constraints on the expected table counts) have been prefixed in a Poisson loglinear model. Therefore with the theory developed in this paper, multinomial and product multinomial loglinear models can be considered as a particular case. To carry out inferences on the parameters in the LMLC an information-theoretic approach is followed from which the classical maximum likelihood estimators and Pearson chi-square statistics for goodness-of fit are obtained. In addition, nested hypotheses are proposed as a general procedure for hypothesis testing. Through a simulation study the appropriateness of proposed inference tools is illustrated.

Closed Form Estimators of Latent Cell Frequencies

Biometrical Journal, 1986

One-stage and two-stage closed form estimators of latent cell frequencies in multidimensional contingency tablee are derived from the weighted least squares criterion. The first stage estimator is Beymptotically equivalent to the conditional maximum likelihood estimator and does not necessarily have minimum asymptotic variance. The second stage estimator does have minimum asymptotic variance relative to any other existing estimator. The closed form estimators are defined for any number of latent cells in contingency tables of any order under exact general linear constraints on the logarithms of the nonlatent and latent cell frequencies.