Methodological and didactical controversies around statistical inference (original) (raw)

The significance fallacy in inferential statistics

BMC Research Notes, 2015

Background: Statistical significance is an important concept in empirical science. However the meaning of the term varies widely. We investigate into the intuitive understanding of the notion of significance. Methods: We described the results of two different experiments published in a major psychological journal to a sample of students of psychology, labeling the findings as 'significant' versus 'non-significant.' Participants were asked to estimate the effect sizes and sample sizes of the original studies. Results: Labeling the results of a study as significant was associated with estimations of a big effect, but was largely unrelated to sample size. Similarly, non-significant results were estimated as near zero in effect size. Conclusions: After considerable training in statistics, students largely equate statistical significance with medium to large effect sizes, rather than with large sample sizes. The data show that students assume that statistical significance is due to real effects, rather than to 'statistical tricks' (e.g., increasing sample size).

A comparative study of statistical inference from an educational point of view

2015

Inferential statistics is the scientific method for evidence-based knowledge acquisition. The underlying logic is difficult and the mathematical methods created for this purpose are based on advanced concepts of probability, combined with different epistemological positions. Many different approaches have been developed over the years. Following the classical significance tests of Fisher and the statistical tests by Neyman and Pearson, and decision theory, two more approaches are considered here using qualitative scientific argument: the Bayesian approach, which is linked to a contested conception of probability, and the rerandomization and bootstrap strand, which is bound to simulation. While Barnett (1982) analysed statistical inference from a mathematical/philosophical perspective to shed light on the various approaches, we analyse from the grand scenario of statistics education and investigate the relative merits of each approach. Some thoughts are developed to reconsider inform...

Statistical Inference In Textbooks : Mathematical And Everyday Contexts

2007

Various terms in the field of Statistical Inference and their presentation in secondary school text books are examined. A comparison of these terms in secondary school textbooks is carried out against their meanings in everyday use as well as in the mathematical context from two standard university textbooks from the field. We offer evidence that the meanings are not necessarily the same and that in some cases the definition which appears in the secondary school textbook is closer to its everyday use than to its mathematical one. Some implications for school textbook writers are derived. THEORETICAL FRAMEWORK Changes in mathematics secondary school curriculum, including statistics, have taken place in several Western European countries, such The Netherlands and Spain, since the late 70´s. Some statistical concepts which were previously introduced during the early years at the university level are now being taught at the secondary level, i.e. confidence intervals and hypothesis testing based on the normal distribution. Some recommendations have even been made to introduce basic concepts of statistical inference during earlier schooling (NCTM, 2000), but of course without the required sophistication and formalization seen at the university levels. Ample research into the difficulties and obstacles that students encounter when facing statistical inference have also appeared lately. Vallecillos and Batanero (1997) identified difficulties in the learning and understanding of statistical inference in the university context, especially with respect to the concepts of significance levels, parameters, and a statistic, among others, in addition to a general understanding of the logic involved in hypothesis testing. Moreno and Vallecillos (2002), researched the secondary school setting. Their study of 15 and 16 year-old students showed that students had misconceptions about statistical and carried incorrect inferences. They identify Representativeness as the key concept which presents the most difficulty (Kahnemann et al., 1982). Specifically, they point out that Representativeness is characterized by the belief that small samples must reproduce the essential characteristics of the population from which it has been taken. Students also find Hypothesis Testing to be difficult. Vallecillos (1999) indicates that students possess different ideas about exactly what a hypothesis test is. García-Alonso and García-Cruz (2003) carried out a study using a sample of (n=50) students who sat for the University Entrance Examination. They concluded that most students (86%) were unable to completely carry out those problems in the exam which dealt with Statistical Inference, even though these exercises were no different than the typical

Why We Don’t Really Know What “Statistical Significance” Means: A Major Educational Failure*

The Neyman–Pearson theory of hypothesis testing, with the Type I error rate, α, as the significance level, is widely regarded as statistical testing orthodoxy. Fisher’s model of significance testing, where the evidential p value denotes the level of significance, nevertheless dominates statistical testing practice. This paradox has occurred because these two incompatible theories of classical statistical testing have been anonymously mixed together, creating the false impression of a single, coherent model of statistical inference. We show that this hybrid approach to testing, with its misleading p < α statistical significance criterion, is common in marketing research textbooks, as well as in a large random sample of papers from twelve marketing journals. That is, researchers attempt the impossible by simultaneously interpreting the p value as a Type I error rate and as a measure of evidence against the null hypothesis. The upshot is that many investigators do not know what our most cherished, and ubiquitous, research desideratu —“statistical significance”—really means. This, in turn, signals an educational failure of the first order. We suggest that tests of statistical significance, whether p’s or α’s, be downplayed in statistics and marketing research courses. Classroom instruction should focus instead on teaching students to emphasize the use of confidence intervals around point estimates in individual studies, and the criterion of overlapping confidence intervals when one has estimates from similar studies.

Issues in Statistical Inference

2002

Being critical of using significance tests in empirical research, the Board of Scientific Affairs (BSA) of the American Psychological Association (APA) convened a task force "to elucidate some of the controversial issues surrounding applications of statistics including significance testing and its alternatives; alternative underlying models and data transformation; and newer methods made possible by powerful computers" (BSA; quoted in the report by Wilkinson & Task Force, 1999, p. 594). Guidelines are stipulated in the report for revising the statistical sections of the APA Publication Manual.

Students’ misconceptions of statistical inference: A review of the empirical evidence from research on statistics education

Educational Research Review, 2007

A solid understanding of inferential statistics is of major importance for designing and interpreting empirical results in any scientific discipline. However, students are prone to many misconceptions regarding this topic. This article structurally summarizes and describes these misconceptions by presenting a systematic review of publications that provide empirical evidence of them. This group of publications was found to be dispersed over a wide range of specialized journals and proceedings, and the methodology used in the empirical studies was very diverse. Three research needs rise from this review: (1) further empirical studies that identify the sources and possible solutions for misconceptions in order to complement the abundant theoretical and statistical discussion about them; (2) new insights into effective research designs and methodologies to perform this type of research; and (3) structured and systematic summaries of findings like the one presented here, concerning misconceptions in other areas of statistics, that might be of interest both for educational researchers and teachers of statistics.

A Comparative Educational Study of Statistical Inference

2013

In his “Comparative Statistical Inference”, Barnett (1982) investigates the various approaches towards statistical inference from a mathematical and philosophical perspective. There have been a few isolated endeavours to develop varied teaching approaches of statistical inference. ‘Comparative statistical inference from an educational perspective’ is long overdue. After discussing Barnett, we give an overview on various attempts to simplify the concepts for teaching. Informal inference is a major endeavour among such projects; resampling and Bootstrap is a newer development in statistical inference, which has also some appeal for teaching. In the light of Barnett’s comparative evaluation we develop some essential alternatives for teaching like Bayes or non Bayes. References to Barnett will illustrate that simple solutions might bias the concepts. Rather than optimizing isolated approaches towards teaching statistical inference, a comparative educational study is suggested. The aim o...

A Response to White and Gorard: Against Inferential Statistics: How and Why Current Statistics Teaching Gets It Wrong

STATISTICS EDUCATION RESEARCH JOURNAL

White and Gorard make important and relevant criticisms of some of the methods commonly used in social science research, but go further by criticising the logical basis for inferential statistical tests. This paper comments briefly on matters we broadly agree on with them and more fully on matters where we disagree. We agree that too little attention is paid to the assumptions underlying inferential statistical tests, to the design of studies, and that p-values are often misinterpreted. We show why we believe their argument concerning the logic of inferential statistical tests is flawed, and how White and Gorard misrepresent the protocols of inferential statistical tests, and make brief suggestions for rebalancing the statistics curriculum. First published May 2017 at Statistics Education Research Journal Archives

The Magical Influene of Statistical Significance

This paper examined 1122 statistical tests found in 55 master theses accredited during 1995-2000 at Mu'tah University. It tried to answer two main questions: First, do researchers still relying on the level of significance (a) as the only criterion to judge the importance of relations and differences? Second, to what extent practical significance can be found along with statistical significance? Results showed that researchers do consider statistical significance as the only criterion to judge the importance of their findings. 74.33% of the statistically significant tests were having a small practical significance, and only 10.27% were oflarge practical significance.

Statistical Inference and Evidence-Based Science

International journal of aquatic research and education, 2011

I presume that many readers may have heard some variation of the quote, attributed to British Prime Minister Benjamin Disreali and popularized by American humorist, Mark Twain (a.k.a., Samuel Clemens), when referring to confusion generated by the use and misuse of quantitative figures. "There are three kinds of mistruth: lies, damned lies, and statistics" (Twain, 1906). One of the "rites of passage" associated with obtaining a graduate degree is being required to complete multiple statistics classes. I try to sympathize with my current students when I reflect on how little I could recall after finishing my first course in tests and measurements as an undergraduate. During my Masters program at Purdue University, I gained a completely undeserved reputation for being a "statistics whiz," bestowed upon me by my fellow student and oft co-conspirator, Larry Bruya (who fulfills my personal definition of a "true friend" wherein a "friend" is said to be one who will bail you out of jail, while a "true friend" is one who sits in the jail cell with you and proclaims, "Golly, that was fun!"). Larry and I took the same first-level statistics class together at Purdue and in the evenings while studying, he would quiz me about what each day's topic meant. I was too dumb to realize that Larry wasn't asking rhetorical questions to challenge me, but that he really didn't know the answers. I figured I didn't want to appear stupid, so I started concocting answers and in the process figured out how to actively learn statistics! Thanks, Larry. There is a sequel to this story decades later. Whenever I make some kind of pronouncement in his presence, Larry has learned to inquire, "Do you really know the answer or are you just making that up?!" Such an inquiry never fails to result in gales of laughter while Larry explains to whoever is gathered our personal story about what he affectionately calls "making up crap about statistics." An important realization to come from any discussion about statistics, with or without any notion of lying or even just "making up crap," is that comprehending statistics can legitimately be quite confusing, even to those with some basic knowledge. They can be utterly mystifying to those without a degree of quantitative literacy in probability, laws of chance, and elementary statistical procedures. Worse, when statistics have been misused (say it ain't so!) simply to support one's preconceived opinion, all trust in them can go right out the window so that the validity of all statistics becomes suspect.