Statistical Methodology Research Papers - Academia.edu (original) (raw)

In this article, we consider Bayesian inference procedures to test for a unit root in Stochastic Volatility (SV) models. Unit-root tests for the persistence parameter of the SV models, based on the Bayes Factor (BF), have been recently introduced in the literature. In contrast, we propose a flexible class of priors that is noninformative over the entire support of the persistence parameter (including the non-stationarity region). In addition, we show that our model fitting procedure is computationally efficient (using the software WinBUGS). Finally, we show that our proposed test procedures have good frequentist properties in terms of achieving high statistical power, while maintaining low total error rates. We illustrate the above features of our method by extensive simulation studies, followed by an application to a real data set on exchange rates.

- by
- •
- Statistics, Bayesian Inference, Markov Chain Monte Carlo, Stochastic Volatility

Improved point estimation and interval estimation Scale parameter of an exponential distribution a b s t r a c t

This is a comparative study of various clustering and classification algorithms as applied to differentiate cancer and non-cancer protein samples using mass spectrometry data. Our study demonstrates the usefulness of a feature selection step prior to applying a machine learning tool. A natural and common choice of a feature selection tool is the collection of marginal p-values obtained from t-tests for testing the intensity differences at each m/z ratio in the cancer versus non-cancer samples. We study the effect of selecting a cutoff in terms of the overall Type 1 error rate control on the performance of the clustering and classification algorithms using the significant features. For the classification problem, we also considered m/z selection using the importance measures computed by the Random Forest algorithm of Breiman. Using a data set of proteomic analysis of serum from ovarian cancer patients and serum from cancer-free individuals in the Food and Drug Administration and National Cancer Institute Clinical Proteomics Database, we undertake a comparative study of the net effect of the machine learning algorithm-feature selection tool-cutoff criteria combination on the performance as measured by an appropriate error rate measure.

- by Susmita Datta
- •
- Algorithms, Artificial Intelligence, Statistics, Machine Learning

Several models for longitudinal data with nonrandom missingness are available. The selection model of Diggle and Kenward is one of these models. It has been mentioned by many authors that this model depends on untested modelling assumptions, such as the response distribution, from the observed data. So, a sensitivity analysis of the study's conclusions for such assumptions is needed. The stochastic EM algorithm is proposed and developed to handle continuous longitudinal data with nonrandom intermittent missing values when the responses have non-normal distribution. This is a step in investigating the sensitivity of the parameter estimates to the change of the response distribution. The proposed technique is applied to real data from the International Breast Cancer Study Group.

- by Ahmed Gad
- •
- Statistics, Breast Cancer, Quality of life, Sensitivity Analysis

The failure rate function commonly has a bathtub shape in practice. In this paper we discuss a regression model considering new Weibull extended distribution developed by that can be used to model this type of failure rate function. Assuming censored data, we discuss parameter estimation: maximum likelihood method and a Bayesian approach where Gibbs algorithms along with Metropolis steps are used to obtain the posterior summaries of interest. We derive the appropriate matrices for assessing the local influence on the parameter estimates under different perturbation schemes, and we also present some ways to perform global influence. Also, some discussions on case deletion influence diagnostics are developed for the joint posterior distribution based on the Kullback-Leibler divergence. Besides, for different parameter settings, sample sizes and censoring percentages, are performed various simulations and display and compare the empirical distribution of the Martingale-type residual with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be straightforwardly extended to the martingale-type residual in log-Weibull extended models with censored data. Finally, we analyze a real data set under a log-Weibull extended regression model. We perform diagnostic analysis and model check based on the martingale-type residual to select an appropriate model.

- by Giovana Oliveira Silva
- •
- Statistics, Model Checking, Sensitivity Analysis, Parameter estimation

Thurstone scaling is a widely used tool in marketing research, as well as in areas of applied psychology. The positions of the compared items, or stimuli on a Thurstone scale are estimated by averaging the quantiles corresponding to frequencies of each stimulus's preference over the other stimuli. We consider maximum likelihood estimation for Thurstone scaling that utilizes paired comparison data. From this perspective we obtain a binary response regression with a probit or logit link. In addition to the levels on a psychological scale, the suggested approach produces standard errors, t-statistics, and other characteristics of regression quality. This approach can help in both the theoretical interpretation and the practical application of Thurstone modeling.

- by Stan Lipovetsky
- •
- Statistics, Statistical Methodology

Tests of homogeneity for partially matched-pair data are investigated. Different methods of combining tests of homogeneity based on Pearson chi-square test and McNemar chi-squared test are investigated. Numerical and simulation studies are presented to compare the power of these tests. Data from the National Survey of Children's Health of 2003 (NSCH) is used to illustrate the methods.

- by Robert Vogel
- •
- Statistics, Statistical Methodology, Chi Square Test

In this paper we describe an exterior product method for computing the Jacobian of a transformation. This alternative procedure is often simpler than the conventional determinant approach. We illustrate the proposed method via commonly... more

- by Dulal Bhaumik
- •
- Statistics, Jacobian, Statistical Methodology, Differentials

and sharing with colleagues.

- by Ayman Alzaatreh and +1
- •
- Statistics, Statistical Methodology

Linear regression models where the response variable is censored are often considered in statistical analysis. A parametric relationship between the response variable and covariates and normality of random errors are assumptions typically considered in modeling censored responses. In this context, the aim of this paper is to extend the normal censored regression model by considering on one hand that the response variable is linearly dependent on some covariates whereas its relation to other variables is characterized by nonparametric functions, and on the other hand that error terms of the regression model belong to a class of symmetric heavy-tailed distributions capable of accommodating outliers and/or influential observations in a better way than the normal distribution. We achieve a fully Bayesian inference using pthdegree spline smooth functions to approximate the nonparametric functions. The likelihood function is utilized to compute not only some Bayesian model selection measures but also to develop Bayesian case-deletion influence diagnostics based on the q-divergence measures. The newly developed procedures are illustrated with an application and simulated data.

- by Luis Castro
- •
- Statistics, Statistical Methodology, Bayesian Modeling

We present the old-but-new problem of data quality from a statistical perspective, in part with the goal of attracting more statisticians, especially academics, to become engaged in research on a rich set of exciting challenges. The data quality landscape is described, and its research foundations in computer science, total quality management and statistics are reviewed. Two case studies based on an EDA approach to data quality are used to motivate a set of research challenges for statistics that span theory, methodology and software tools.

- by Vijay Gupta
- •
- Statistics, Case Study, Data Quality, Total Quality Management

A number of criteria, test statistics and diagnostic plots have been developed in order to test the adapted distribution assumption. Usually, a simpler distribution is tested against a more complicated one (by adding an extra parameter), which include the first distribution as a special case (nested distributions). In this paper, two new tests are developed in order to test nested distributions. This work concentrates on the case where the extra parameter lies on the boundary of the parameter space. The developed tests are applied on testing the frailty term on frailty models. Simulation results are presented for comparison with other test statistics and the methods are illustrated on two real data sets.

- by Polychronis Economou
- •
- Statistics, Model Selection, Statistical Methodology, parameter space

and sharing with colleagues.

- by Carl Lee
- •
- Statistics, Statistical Methodology

There has been a greatly increased interest in statistical methods in the psychiatric research and its applications over the past few decades, in parallel with advances in computers and statistical software. This review aims to describe the main topics related to statistical methods in psychopharmacology, namely nature of statistics in medicine, problems in data analysis, statistical modelling, developments in statistical technology, statistical reporting and meta-analysis.

- by Pentti Nieminen and +1
- •
- Statistics, Epidemiology, Psychopharmacology, Clinical Trials

The exponential type is characterized in terms of the regression of a (possibly non-linear) function of a record value with its adjacent record values as covariates. Monotone transformations extend this result to more general settings,... more

- by Rafiqullah Khan
- •
- Applied Mathematics, Econometrics, Statistics, Pure Mathematics
- by Ashis SenGupta
- •
- Statistics, Bayesian Analysis, Uncertain Data, Goodness of Fit

When subjects are monitored for a recurrent event over a period of time, each individual behaves like an experimental unit within which measurements may be correlated. The subject-specific observation window (i.e. monitoring period) constitutes another factor controlling the accumulation of events and censoring. We develop a procedure for estimating survivor parameters in the presence of joint effect of correlation and informative monitoring; specifically, for studies in which the survival time for a subject is censored because of deterioration of their physical condition or due to the accumulation of their event occurrences. In this manuscript, we approach the survivor parameter estimation problem by a fully parametric baseline hazard model where the intensity functions of the inter-event time and the duration of the monitoring period are reconciled through the generalized Koziol-Green (KG) model [12]), and the within experimental unit correlation modeled through frailty. We outline the Expectation Maximization (EM) steps for estimating Weibull parameters with correlated recurrent event data under informative monitoring. We apply our method to a real life data.

- by Gideon Zamba
- •
- Statistics, Statistical Methodology

Missing data, such as non-response, is often problematic in social network analysis since what is missing may potentially alter the conclusions about what we have observed. This in the sense that individual tie-variables typically need to be interpreted in relation to their local neighbourhood and the global structure. Some ad-hoc methods for dealing with missing data in social networks have been proposed but here we consider a model-based approach. We discuss various aspects of fitting exponential family random graph (or p-star) models (ERGMs) to networks with missing data and present a Bayesian data augmentation algorithm for the purpose of estimation. This involves drawing from the fully conditional posterior distribution of the parameters, something which is made possible by a recently developed algorithm. With ERGMs already having complicated interdependencies, we argue that it is particularly important to provide inference that adequately describes the uncertainty, something that the Bayesian approach caters for. To the extent that we wish to explore the missing parts of the network, the posterior predictive distributions, immediately available at the termination of the algorithm, are at our disposal, which allows us to explore the distribution of what is missing unconditionally on any particular parameter values. Some important features of treating missing data and of the implementation of the algorithm are illustrated using a well known collaboration network.

- by Johan Koskinen
- •
- Statistics, Random Graph Theory, Collaborative Networks, Missing Data

S. Hamid et al. / Statistical Methodology 7 (2010) 552-573 553 finds the county-wise loss and the total loss for the entire state of Florida. The computer team then compiles all information from atmospheric science, engineering and actuarial components, processes all hurricane related data and completes the project. The model was submitted to the Florida Commission on Hurricane Loss Projection Methodology for approval and went through a rigorous review and was revised as per the suggestions of the commission. The final model was approved for use by the insurance companies in Florida by the commission. At every stage of the process, statistical procedures were used to model various parameters and validate the model. This paper presents a brief summary of the main components of the model (meteorology, vulnerability and actuarial) and then focuses on the statistical validation of the same.

- by Bachir Annane
- •
- Statistics, Atmospheric Science, Public Domain, Goodness of Fit

In this paper, we consider the simple step-stress model for a twoparameter exponential distribution, when both the parameters are unknown and the data are Type-II censored. It is assumed that under two different stress levels, the scale parameter only changes but the location parameter remains unchanged. It is observed that the maximum likelihood estimators do not always exist. We obtain the maximum likelihood estimates of the unknown parameters whenever they exist. We provide the exact conditional distributions of the maximum likelihood estimators of the scale parameters. Since the construction of the exact confidence intervals is very difficult from the conditional distributions, we propose to use the observed Fisher Information matrix for this purpose. We have suggested to use the bootstrap method for constructing confidence intervals. Bayes estimates and associated credible intervals are obtained using the importance sampling technique. Extensive simulations are performed to compare the performances of the different confidence and credible intervals in terms of their coverage percentages and average lengths. The performances of the bootstrap confidence intervals are quite satisfactory even for small sample sizes.

- by Debasis Kundu
- •
- Statistics, Statistical Methodology

We propose applying the multiparametric spatiotemporal autoregressive (m-STAR) model as a simple approach to estimating jointly the pattern of connectivity and the strength of contagion by that pattern, including the case where connectivity is endogenous to the dependent variable (selection). We emphasize substantivelytheoretically guided (i.e., structural) specifications that can support analyses of estimated spatiotemporal responses to stochastic or covariate shocks and that can distinguish the possible sources of spatial association: common exposure, contagion, and selection (e.g., homophily). We illustrate this approach to dynamic, endogenous interdependence -which parallels models of network-behavior co-evolution in the longitudinal networks literature -with an empirical application that aims to disentangle the roles of economic interdependence, correlated external and internal stimuli, and EU membership in shaping labor market policies in developed democracies in recent years.

- by Robert J Franzese
- •
- Statistics, Political Science, Spatial econometrics, Variable Selection

When identifying the best model for representing the behavior of rainfall distribution based on a sequence of dry (wet) days, focus is usually given on the fitted model with the least number of estimated parameters. If the model with lesser number of parameters is found not adequate for describing a particular data distribution, the model with a higher number of parameters is recommended. Based on several probability models developed by previous researchers in this field, we propose five types of mixed probability models as the alternative to describe the distribution of dry (wet) spells for daily rainfall events. The mixed probability models comprise of the combination of log series distribution with three other types of models, which are Poisson distribution (MLPD), truncated Poisson distribution (MLTPD), and geometric distribution (MLGD). In addition, the combination of the two log series distributions (MLSD) and the mixed geometric with the truncated Poisson distribution (MGTPD) are also introduced as the alternative models. Daily rainfall data from 14 selected rainfall stations in Peninsular Malaysia for the periods of 1975 to 2004 were used in this present study. When selecting the best probability model to describe the observed distribution of dry (wet) spells, the Akaike's Information Criterion (AIC) was considered. The results revealed that MLGD was the best probability model to represent the distribution of dry spells over the Peninsular.

- by Abdul Aziz Jemain
- •
- Statistics, Data Distribution, Statistical Methodology, Poisson Distribution

Using the classical estimation method of moments, we propose a new semiparametric estimation procedure for multi-parameter copula models. Consistency and asymptotic normality of the obtained estimators are established. By considering an Archimedean copula model, an extensive simulation study, comparing these estimators with the pseudo maximum likelihood, rho-inversion and tau-inversion ones, is carried out. We show that, with regards to the other methods, the moment based estimation is quick and simple to use with reasonable bias and root mean squared error.

- by Brahim brahimi
- •
- Statistics, Root-Mean Square Error, Statistical Methodology, Estimation Method

A stationarity test on Markov chain models is proposed in this paper. Most of the previous test procedures for Markov chain models have been done based on conditional probabilities of transition matrix. The likelihood ratio test and chi-square test have been used for test procedures such as stationarity, order of Markov chain, and goodness of fit test, for which all the parameters need to be estimated. This paper uses the efficient score test, an extension of Tsiatis model, for testing the stationarity of Markov chain model based on marginal distribution as obtained . For testing the suitability of the proposed method, a numerical example of real life data is given. . . .

- by Norhashidah Awang
- •
- Statistics, Statistical Methodology

abundance at each survey date. While abundance estimation was very imprecise for each date, we were able to combine them to obtain good estimates of overall population abundance even though the population was spatially dynamic. The proposed hierarchical model combined submodels and accounted for their sources of uncertainty. Spotted seals were most abundant within the study area (233,700, 95% CI 137,300-793,100), followed by bearded seals (61,800, 95% CI 34,900-171,600) and ribbon seals (61,100, 95% CI 35,300).

- by Jay Ver Hoef and +1
- •
- Statistics, Sampling, Statistical Methodology

When counting the number of chemical parts in air pollution studies or when comparing the occurrence of congenital malformations between a uranium mining town and a control population, we often assume Poisson distribution for the number of these rare events. Some discussions on sample size calculation under Poisson model appear elsewhere, but all these focus on the case of testing equality rather than testing equivalence. We discuss sample size and power calculation on the basis of exact distribution under Poisson models for testing non-inferiority and equivalence with respect to the mean incidence rate ratio. On the basis of large sample theory, we further develop an approximate sample size calculation formula using the normal approximation of a proposed test statistic for testing non-inferiority and an approximate power calculation formula for testing equivalence. We find that using these approximation formulae tends to produce an underestimate of the minimum required sample size calculated from using the exact test procedure. On the other hand, we find that the power corresponding to the approximate sample sizes can be actually accurate (with respect to Type I error and power) when we apply the asymptotic test procedure based on the normal distribution. We tabulate in a variety of situations the minimum mean incidence needed in the standard (or the control) population, that can easily be employed to calculate the minimum required sample size from each comparison group for testing non-inferiority and equivalence between two Poisson populations.

- by Edmond Lui
- •
- Statistics, Air pollution, Sample Size, Normal approximation

Jeffreys and Shtarkov distributions play an important role in universal coding and minimum description length (MDL) inference, two central areas within the field of information theory. It was recently discovered that in some situations Shtarkov distributions exist while Jeffreys distributions do not. To demonstrate some of these situations we consider in this note the class of natural exponential families (NEF's) and present a general result which enables us to construct numerous classes of infinitely divisible NEF's for which Shtarkov distributions exist and Jeffreys distributions do not. The method used to obtain our general results is based on the variance functions of such NEF's. We first present two classes of parametric NEF's demonstrating our general results and then generalize them to obtain numerous multiparameter classes of the same type.

- by Peter Grünwald and +2
- •
- Statistics, Information Theory, Regret, Minimum description length

A new method for analyzing high-dimensional categorical data, Linear Latent Structure (LLS) analysis, is presented. LLS models belong to the family of latent structure models, which are mixture distribution models constrained to satisfy the local independence assumption. LLS analysis explicitly considers a family of mixed distributions as a linear space and LLS models are obtained by imposing linear constraints on the mixing distribution.

- by Mikhail Kovtun
- •
- Statistics, Principal Component Analysis, Linear Algebra, Parameter estimation
- by B. Murphy
- •
- Statistics, Clustering, Markov Chain Monte Carlo, Mixture of Experts

This paper deals with a single server Poisson arrival queue with two phases of heterogeneous service along with a Bernoulli schedule vacation model, where after two successive phases service the server either goes for a vacation with probability p (0 ≤ p ≤ 1) or may continue to serve the next unit, if any, with probability q(= 1− p). Further the concept of multiple vacation policy is also introduced here. We obtained the queue size distributions at a departure epoch and at a random epoch, Laplace Stieltjes Transform of the waiting time distribution and busy period distribution along with some mean performance measures. Finally we discuss some statistical inference related issues.

- by Gautam Choudhury
- •
- Statistics, Statistical Analysis, Primary, Statistical Inference
- by James Lesage
- •
- Statistics, Interpreting, Parameter estimation, Markov Chain Monte Carlo

Spatial sampling design is concerned with the optimal allocation of samples to spatial coordinates in order to improve in a welldefined sense the estimation and prediction of spatial random fields. Unfortunately, objective functions in spatial sampling design seem to be so complicated so far that most often stochastic search algorithms are used to get these design criteria optimized. Our intention is to show that the minimization of the average kriging variance design criterion shows a mathematically tractable structure when considering the random field as a linear regression model with infinitely many random coefficients. Either the Karhunen-Loeve expansion or the polar spectral representation of the random field may be used to get such a favourable representation. Well-known convex experimental design theory may be applied then to this high dimensional cosine-sine-Bessel surface harmonics random coefficients regression model to calculate spatial sampling designs. We study a monitoring network for rainfall during the monsoon in Pakistan and consider both the optimal deletion and subsequent addition of monitoring stations from/to this network. Only deterministic optimization algorithms and no stochastic search algorithms are used for the task of network optimization. As external drift variables determining the rainfall trend wind, humidity and elevation are considered.

- by Ijaz Hussain
- •
- Statistics, Statistical Methodology

The paper deals with life distributions which are harmonic new better than renewal used in expectation (HNBRUE). The goal is to derive moment inequalities and use them for further analysis of the class HNBRUE. We establish new characterization of exponentiality versus HNBRUE. Pitman's asymptotic relative efficiency is employed to assess the performance of the proposed test with respect to other available tests. Finally, we carried out numerical simulation to produce table for the critical values of the test.

- by Bander Al-Zahrani
- •
- Statistics, Numerical Simulation, Statistical Methodology

The two-parameter generalized exponential distribution has been used recently quite extensively to analyze lifetime data. In this paper the two-parameter generalized exponential distribution has been embedded in a larger class of distributions obtained by introducing another shape parameter. Because of the additional shape parameter more flexibility has been introduced in the family. It is observed that the new family is positively skewed, and has increasing, decreasing, unimodal and bathtub shaped hazard functions. It can be observed as a proportional reversed hazard family of distributions. This new family of distributions is analytically quite tractable and it can be used quite effectively to analyze censored data also. Analysis of two data sets are performed and the results are quite satisfactory.

- by Debasis Kundu
- •
- Statistics, Statistical Methodology

- by Luís Castro
- •
- Statistics, Statistical Methodology
- by Sebastian Gallegos
- •
- Education, Gender, Income Distribution, Multivariate Analysis

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier's archiving and manuscript policies are encouraged to visit:

- by Gauss Cordeiro
- •
- Mathematics, Statistics, Data Analysis, Survival Analysis

- by Gauss Cordeiro
- •
- Mathematics, Statistics, Data Analysis, Survival Analysis

This paper develops a method for estimating parameters of a vector autoregression (VAR) observed in white noise. The estimation method assumes the noise variance matrix is known and does not require any iterative process. This study provides consistent estimators and shows the asymptotic distribution of the parameters required for conducting tests of Granger causality. Methods in the existing statistical literature cannot be used for testing Granger causality, since under the null hypothesis the model becomes unidentifiable. Measurement error effects on parameter estimates were evaluated by using computational simulations. The results show that the proposed approach produces empirical false positive rates close to the adopted nominal level (even for small samples) and has a good performance around the null hypothesis. The applicability and usefulness of the proposed approach are illustrated using a functional magnetic resonance imaging dataset.

- by Nicolas Debarsy
- •
- Statistics, Interpreting, Parameter estimation, Markov Chain Monte Carlo

In the present paper we study the properties of the left and right truncated variance of a function of a non-negative random variable, that characterize a class of continuous distributions. These properties include characterizations by the relationships the conditional variance has with the truncated expectations and/or the failure rate as well as the lower bound to the conditional variance. It is shown that the characteristic properties are linked to those based on the relationship between the conditional means and the failure rates, discussed in the literature. The lower bound developed here compares favourably with that given by the Cramer–Rao inequality.

- by Sudheesh Kattumannil
- •
- Applied Mathematics, Statistics, Pure Mathematics, Order Statistics

Estimating the parameter of a Dirichlet distribution is an interesting question since this distribution arises in many situations of applied probability. Classical procedures are based on sample of Dirichlet distribution. In this paper we exhibit five different estimators from only one observation. They are based either on residual allocation model decompositions or on sampling properties of Dirichlet distributions. Two ways are investigated: the first one uses fragments' size and the second one uses size-biased permutations of a partition. Numerical computations based on simulations are supplied. The estimators are finally used to estimate birth probabilities per month.

- by Christian Paroissin
- •
- Statistics, Sampling, Maximum Likelihood, Sample Size

We evaluate the performance of an extensive family of ARCH models in modelling daily Valueat-Risk (VaR) of perfectly diversified portfolios in five stock indices, using a number of distributional assumptions and sample sizes. We find, first, that leptokurtic distributions are able to produce better one-step-ahead VaR forecasts; second, the choice of sample size is important for the accuracy of the forecast, whereas the specification of the conditional mean is indifferent. Finally, the ARCH structure producing the most accurate forecasts is different for every portfolio and specific to each equity index.

- by Stavros DEGIANNAKIS and +1
- •
- Statistics, Volatility Forecasting, ARCH model, Value at Risk

a b s t r a c t Mark-recapture experiments involve capturing individuals from populations of interest, marking and releasing them at an initial sample time, and recapturing individuals from the same populations on subsequent occasions. The Jolly-Seber model is widely used in open-population models since it can estimate important parameters such as population size, recruitment, and survival. However, one of the Jolly-Seber model assumptions that can be easily violated is that of no tag loss. Cowen and Schwarz [L. Cowen, C.J. Schwarz, The Jolly-Seber model with tag loss, Biometrics 62 (2006) 677-705] developed the Jolly-Seber-Tag-Loss (JSTL) model to avoid this violation; this model was extended to deal with group heterogeneity by Gonzalez and Cowen [S. Gonzalez, L. Cowen, The Jolly-Seber-tag-loss model with group heterogeneity, The Arbutus Review 1 (2010) 30-42].

- by Caleb Gardner
- •
- Statistics, Statistical Methodology

Intensive care unit (ICU) patients are ell known to be highly susceptible for nosocomial (i.e. hospital-acquired) infections due to their poor health and many invasive therapeutic treatments. The effects of acquiring such infections in ICU on mortality are however ill understood. Our goal is to quantify these effects using data from the National Surveillance Study of Nosocomial Infections in Intensive Care Units (Belgium). This is a challenging problem because of the presence of timedependent confounders (such as exposure to mechanical ventilation)which lie on the causal path from infection to mortality. Standard statistical analyses may be severely misleading in such settings and have shown contradicting results. While inverse probability weighting for marginal structural models can be used to accommodate time-dependent confounders, inference for the effect of ?ICU acquired infections on mortality under such models is further complicated (a) by the fact that marginal structural models infer the effect of acquiring infection on a given, fixed day ?in ICU?, which is not well defined when ICU discharge comes prior to that day; (b) by informative censoring of the survival time due to hospital discharge; and (c) by the instability of the inverse weighting estimation procedure. We accommodate these problems by developing inference under a new class of marginal structural models which describe the hazard of death for patients if, possibly contrary to fact, they stayed in the ICU for at least a given number of days s and acquired infection or not on that day. Using these models we estimate that, if patients stayed in the ICU for at least s days, the effect of acquiring infection on day s would be to multiply the subsequent hazard of death by 2.74 (95 per cent conservative CI 1.48; 5.09).

- by Els Goetghebeur
- •
- Genetics, Statistics, Epidemiology, Biostatistics

We introduce the Hinde-Demétrio (HD) regression models in order to analyze overdispersed count data. We mainly investigate the effect of the dispersion parameter. The HD distributions are discrete additive exponential dispersion models (depending on canonical and dispersion parameters) with a third real index parameter p. They have been characterized by the unit variance function µ + µ p. For p equal to 2, 3,. .. , the corresponding distributions are concentrated on non-negative integers, overdispersed and zero-inflated with respect to a Poisson distribution having the same mean. The negative binomial (p = 2), strict arcsine (p = 3) and Poisson (p → ∞) distributions are particular count HD families. In a generalized linear modelling framework, the effect of the dispersion parameter in the HD regression models is, among other things, pointed out through the two parametrizations for the mean: unit and standard means. In this particular additive model, this effect must be negligible within an adequate HD model for fixed p. The estimation of the integer p is also examined separately. The results are illustrated and discussed on a horticultural data set.

- by Silvio Zocchi
- •
- Mathematics, Statistics, Model Selection, Count data
- by Robert Boik
- •
- Statistics, Principal Component Analysis, Coefficient of Variation, Eigenvalues
- by SOMESH KUMAR
- •
- Statistics, Random sampling, Statistical Methodology