Best way not to misuse p values is not to draw definitive conclusions about hypotheses (original) (raw)

BMJ Evidence-Based Medicine, 2022

Abstract

To cite: Trafimow D, Haley U, Boje D. BMJ EvidenceBased Medicine Epub ahead of print: [please include Day Month Year]. doi:10.1136/ bmjebm-2022-111940 © Author(s) (or their employer(s)) 2022. No commercial reuse. See rights and permissions. Published by BMJ. A recent article in BMJ EvidenceBased Medicine asserted the following, ‘We begin by saying that p values themselves are not flawed. Rather, the use, misuse or abuse of p values in ways antithetical to rigorous scientific pursuits is the flaw’. We show that this assertion is both wrong and misleading. To demonstrate the errors, we start with another wrong assertion on the same page: ‘The only information to be gleaned from p values is whether the observed data are likely where the null hypothesis (that no effect exists) [is] true’. This assertion erroneously assumes that p values are based on null hypotheses when instead they are based on null hypotheses plus added assumptions. For example, one assumes random sampling from the population, 3 a false assumption in almost every medical paper (or business paper or psychology paper) we have read. So many added assumptions exist that researchers have proposed assumption taxonomies. 5 All added assumptions have a very high improbability of being true and herein lies the tale. Statisticians often refer to the model M as including the null hypothesis (or test hypothesis) H and a set of added assumptions A, so that M=H+A. At best, p values indicate how likely the data are, given the model, not the hypothesis. However, because A doubtless contains at least one wrong assumption, M likely is false too. Thus, the null hypothesis may be embedded in a wrong model. Therefore, small p values fail to reveal whether the errors are in added assumptions or the null hypothesis too. A possible counter is that although most models are false, some might be close to true, and so p values may prove useful in reject versus not reject decisions. With dichotomisation, even a slight wrongness, such as lack of random selection, means models are false, and researchers should reject them. Models are wrong regardless of p values, thereby compromising their usefulness. Aguinis et al used the analogy of looking into a murky pool: p values can tell you that the pool likely exists (that you can reject the null hypothesis of no effect), but we also need the effect size to determine the pool’s depth (the size of the effect). We argue that yes, effect size matters, but the pool may not exist. A p value cannot tell you that a pool likely exists because p values do not come from null hypotheses but rather from the wrong models in which they are embedded. P values can help uncover what we know already— that the model is imperfect. In sum, we know the ground is not perfectly uniform, but can draw no sound conclusion about the pool or the likelihood of observations given the pool’s existence (or lack thereof). In conclusion, the best way not to misuse p values is to avoid using them to draw definitive conclusions about hypotheses.

Usha Haley hasn't uploaded this paper.

Let Usha know you want this paper to be uploaded.

Ask for this paper to be uploaded.