p-value fallacy - The Skeptic's Dictionary (original) (raw)

From Abracadabra to Zombies | View All

Many researchers have labored under the misbelief that the p-value gives the probability that their study’s results are just pure random chance. --Regina Nuzzo

The p-value fallacy is "the mistaken idea that a single number can capture both the long-run outcomes of an experiment and the evidential meaning of a single result" (Goodman, 1999). Many people mistakenly think that a p-value of 0.05 means that there is a 5% chance that the null hypothesis is true. A p-value of 0.05 means that if the null hypothesis is true, it will be rejected in 5% of trials over many trials. Thus, the p-value for a single trial doesn’t provide conclusive evidence that a hypothesis is correct. We need many trials before we should assert with confidence that the null hypothesis is true or false.

In February 2015, editors of the scientific journal Basic and Applied Social Psychology announced that researchers who submit studies for publication would not be allowed to use the p-value. Biostatistician Jeff Leek of Johns Hopkins University says that “the p-value is the most widely known statistic.” He estimates that the p-value has been used in at least three million scientific papers. P-values may be extremely popular, but they are widely misunderstood and believed to provide more information than they do.

Many researchers have labored under the misbelief that the p-value gives the probability that their study’s results are just pure random chance. But statisticians say the p-value’s information is much more non-specific, and can be interpreted only in the context of hypothetical alternative scenarios: The p-value summarizes how often results at least as extreme as those observed would show up if the study were repeated an infinite number of times when in fact only pure random chance were at work.

This means that the p-value is a statement about imaginary data in hypothetical study replications, not a statement about actual conclusions in any given study. Instead of being a “scientific lie detector” that can get at the truth of a particular scientific finding, the p-value is more of an “alternative reality machine” that lets researchers compare their results with what random chance would hypothetically produce.*

What does this mean for all those millions of papers whose grand conclusions were based on the assumption that the p-value was a valid guide to the probability of some hypothesis being true? It means that there are many scientific papers that have been published in tier one peer-reviewed journals that contain--in the words of psychologist Eric-Jan Wagenmakers of the University of Amsterdam--"plain nonsense."

further reading

reader comments

Goodman, Stephen N. 1999. Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Annals of Internal Medicine.

Nuzzo, Regina. 2015. Scientists Perturbed by Loss of Stat Tools to Sift Research Fudge from Fact. Scientific American. April 16. The journal Basic and Applied Social Psychology recently banned the use of p-values and other statistical methods to quantify uncertainty from significance in research results.

Acupuncture, the P-Value Fallacy, and Honesty by Kimball Atwood

[new] P Value Under Fire by Steven Novella "The main problem with P-values is that people use them as a substitute for a thorough analysis of the overall scientific rigor of a study. If the P-value creeps over the 0.05 level then people assume the hypothesis is likely to be true – but that is not what the P-value means."[/new]

p-value Wikipedia

Last updated 15-Mar-2016