John W. Tukey's contributions to multiple comparisons (original) (raw)

Some General Theory of Multiple Comparison Procedures

2008

Chapter 2 was devoted to the theory of multiple comparison procedures (MCPs) for fixed-effects linear models with independent homoscedastic normal errors, which was the framework for Part I. Much of that theory applies with minor modifications to many of the problems considered in Part 11. However, in other cases the theory of Chapter 2 needs to be supplemented and extended, which is the purpose of the present appendix. We assume that the reader is familiar with Chapter 2. Many references to that chapter are made in the sequel. As in Chapter 2, throughout this appendix we restrict to the nonsequential (fixed-sample) setting. The following is a summary of this appendix. Section 1 discusses the theory of simultaneous test procedures in arbitrary models. This discussion is based mostly on Gabriel (1969). When a simultaneous test procedure (and more generally a single-step test procedure) addresses hypotheses concerning parametric functions, it can be inverted to obtain a simultaneous confidence procedure for those parametric functions. Conversely, from a given simultaneous confidence procedure one can obtain the associated simultaneous test procedure by applying the confidenceregion test method. The relation between simultaneous confidence estimation and simultaneous testing is the topic of Section 2. Finally Section 3 discusses some theory of step-down test procedures, including the topics of error rate control, optimal choice of nominal significance levels, and directional decisions. Here no general theory for deriving the associated simultaneous confidence estimates is as yet available; some preliminary work in this direction by Kim, Stefhsson, and Hsu (1987) is discussed in Section 4.2.4 of Chapter 2.

Why we (usually) don't have to worry about multiple comparisons

2012

Applied researchers often find themselves making statistical inferences in settings that would seem to require multiple comparisons adjustments. We challenge the Type I error paradigm that underlies these corrections. Moreover we posit that the problem of multiple comparisons can disappear entirely when viewed from a hierarchical Bayesian perspective. We propose building multilevel models in the settings where multiple comparisons arise. Multilevel models perform partial pooling (shifting estimates toward each other), whereas classical procedures typically keep the centers of intervals stationary, adjusting for multiple comparisons by making the intervals wider (or, equivalently, adjusting the p values corresponding to intervals of fixed width). Thus, multilevel models address the multiple comparisons problem and also yield more efficient estimates, especially in settings with low group-level variation, which is where multiple comparisons are a particular concern.

Optimality in multiple comparison procedures

When many (m) null hypotheses are tested with a single dataset, the control of the number of false rejections is often the principal consideration. Two popular controlling rates are the probability of making at least one false discovery (FWER) and the expected fraction of false discoveries among all rejections (FDR). Scaled multiple comparison error rates form a new family that bridges the gap between these two extremes. For example, the Scaled Expected Value (SEV) limits the number of false positives relative to an arbitrary increasing function of the number of rejections, that is, E(FP/s(R)). We discuss the problem of how to choose in practice which procedure to use, with elements of an optimality theory, by considering the number of false rejections FP separately from the number of correct rejections TP. Using this framework we will show how to choose an element in the new family mentioned above.

On Multiple Comparisons in R

The multcomp package for the R statistical environment allows for multiple comparisons of parameters whose estimates are generally correlated, including comparisons of k groups in general linear models.

Best (but oft-forgotten) practices: the multiple problems of multiplicity-whether and how to correct for many statistical tests

The American journal of clinical nutrition, 2015

Testing many null hypotheses in a single study results in an increased probability of detecting a significant finding just by chance (the problem of multiplicity). Debates have raged over many years with regard to whether to correct for multiplicity and, if so, how it should be done. This article first discusses how multiple tests lead to an inflation of the α level, then explores the following different contexts in which multiplicity arises: testing for baseline differences in various types of studies, having >1 outcome variable, conducting statistical tests that produce >1 P value, taking multiple "peeks" at the data, and unplanned, post hoc analyses (i.e., "data dredging," "fishing expeditions," or "P-hacking"). It then discusses some of the methods that have been proposed for correcting for multiplicity, including single-step procedures (e.g., Bonferroni); multistep procedures, such as those of Holm, Hochberg, and Šidák; false discov...

Important Facts and Observations about Pairwise Comparisons (the special issue edition

This study has been inspired by numerous requests for clarification from researchers who often confuse Saaty's Analytic Hierarchy Process (AHP) with the pairwise comparisons (PC) method, taking AHP as the only representation of PC. This study should be regarded as an interpretation and clarification of past investigations of PC. In addition, this article is a reflection on general PC research at a higher level of abstraction: the philosophy of science. It delves into the foundations and implications of pairwise comparisons. Some results of this study are based on a recently published work by Koczkodaj and Szwarc. Finally, open problems have also been reported for future research.

Simultaneous Inference in General Parametric Models

Biometrical Journal, 2008

Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested simultaneously, the probability of rejecting erroneously at least one of them increases beyond the pre-specified significance level. Simultaneous inference procedures have to be used which adjust for multiplicity and thus control the overall type I error rate. In this paper we describe simultaneous inference procedures in general parametric models, where the experimental questions are specified through a linear combination of elemental model parameters. The framework described here is quite general and extends the canonical theory of multiple comparison procedures in ANOVA models to linear regression problems, generalized linear models, linear mixed effects models, the Cox model, robust linear models, etc. Several examples using a variety of different statistical models illustrate the breadth * This is a preprint of an article published in

ROBUST MULTIPLE COMPARISONS BASED ON COMBINED PROBABILITIES FROM INDEPEN\ DENT TESTS

Motivated by a situation encountered in the Well Elderly 2 study, the paper considers the problem of robust multiple comparisons based on KKK independent tests associated with 2K2K2K independent groups. A simple strategy is to use an extension of Dunnett's T3 procedure, which is designed to control the probability of one or more Type I errors. However, this method and related techniques fail to take into account the overall pattern of p-values when making decisions about which hypotheses should be rejected. The paper suggests a multiple comparison procedure that does take the overall pattern into account and then describes general situations where this alternative approach makes a practical difference in terms of both power and the probability of one or more Type I errors. For reasons summarized in the paper, the focus is on 20\% trimmed means, but in principle the method considered here is relevant to any situation where the Type I error probability of the individual tests can be controlled reasonably well.