MECHANISMS, CAUSAL MODELING, AND THE LIMITATIONS OF TRADITIONAL MULTIPLE REGRESSION (original) (raw)

Regression, Causation and Prediction

Lecture Notes in Statistics, 1993

Regression, Causation and Prediction Regression is a special case, not a special subject. The problems of causal inference in regression studies are instances of the problems we have considered in the previous chapters, and the solutions are to be found there as well. What is singular about regression is only that a technique so ill suited to causal inference should have found such wide employment to that purpose. 8.1 When Regression Fails to Measure Influence Regression models are commonly used to estimate the "influence" that regressors X have on an outcome variable, Y.1 If the relations among the variables are linear then for each Xi the expected change in Y that would be produced by a unit change in Xi if all other X variables are forced to be constant can be represented by a parameter, say ai. It is obvious and widely noted (see, for example, Fox, 1984) that the regression estimate of ai will be incorrect if Xi and Y have one or more unmeasured common causes, or in more conventional statistical terminology, the estimate will be biased and inconsistent if the error variable for Y is correlated with Xi. To avoid such errors, it is often recommended (Pratt and Schlaifer, 1988) that investigators enlarge the set of potential regressors and determine if the regression coefficients for the original 1 In linear regression, we understand the "direct influence" of Xi on Y to mean (i) the change in value of a variable Y that would be produced in each member of a population by a unit change in Xi. with all other X variables forced to be unchanged. Other meanings might be given, for example: (ii) the population average change in Y for unit change in Xi. with all other X variables forced to be unchanged; (iii) the change in Y in each member of the population for unit change in Xi; (iv) the population average change in Y for unit change in Xi; etc. Under interpretations (iii) and (iv) the regression coefficient is an unreliable estimate whenever Xi also influences other regressors that influence Y. Interpretation (ii) is equivalent to (i) if the units are homogeneous and the stochastic properties are due to sampling; otherwise. regression will be unreliable under interpretation (i) except in special cases. e.g .• when the linear coefficients, as random variables. are independently distributed (in which case the analysis given here still applies (Glymour. Spirtes and Scheines. 1991a».

Inferences from regression analysis: are they valid? 1

The focus of this paper is regression analysis. Regression analysis forms the core for a family of techniques including path analysis, structural equation modelling, hierarchical linear modelling, and others. Regression analysis is perhaps the most-used quantitative method in the social sciences, most especially in economics and sociology but it has made inroads even in fields like anthropology and history. It forms the principal basis for determining the impact of social policies and, as such, has enormous influence on almost all public policy decisions. This paper raises fundamental questions about the utility of regression analysis for causal inference. I argue that the conditions necessary for regression analysis to yield valid causal inferences are so far from ever being met or approximated that such inferences are never valid. This dismal conclusion follows clearly from examining these conditions in the context of three widely-studied examples of applied regression analysis: earnings functions, education production functions, and aggregate production functions. Since my field of specialization is the economics of education, I approach each of these examples from that perspective. Nonetheless, I argue that my conclusions are not particular to looking at the impact of education or to these three examples, but that the underlying problems exhibited therein generally hold to be true in making causal inferences from regression analyses about other variables and on other topics. Overall argument In some fields, regression analysis is used as an ad hoc empirical exercise for moving beyond simple correlations. Researchers are often interested in the impact of a particular independent variable on a particular dependent variable and use regression analysis as a way of controlling for a few covariates. Despite being common, in many fields such empirical fishing expeditions are frowned upon because the result of particular interest (the coefficient on the key independent variable under examination) will depend on which covariates are selected as " controls ". To the contrary, nowadays, most fields teach that one has to be serious about causal modeling in order to use regression analysis for causal inference. Causal models require certain conditions to hold for regression coefficients to be accurate and unbiased estimates of causal impact. While these conditions are often expressed as properties of regression residuals, they also may be expressed as three necessary conditions for the proper specification of a causal model examining a particular (or set of) dependent variable(s): • All relevant variables are included in the model; 1 I would like to thank Jim Cobbe and an anonymous reviewer for comments on a draft of this paper. I wish to give a special thanks to Sande Milton for his insights and long-term collaboration with me on this topic.

Theory and Analysis of Total, Direct, and Indirect Causal Effects

Multivariate Behavioral Research, 2014

The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden.

Causation, Comparison, and Regression

Harvard Data Science Review, 2024

Comparison and contrast are the basic means to unveil causation and learn which treatments work. To build good comparison groups, randomized experimentation is key, yet often infeasible. In such non-experimental settings, we illustrate and discuss diagnostics to assess how well the common linear regression approach to causal inference approximates desirable features of randomized experiments, such as covariate balance, study representativeness, interpolated estimation, and unweighted analyses. We also discuss alternative regression modeling, weighting, and matching approaches and argue they should be given strong consideration in empirical work.

Notes on Causation, Comparison, and Regression

arXiv (Cornell University), 2023

Comparison and contrast are the basic means to unveil causation and learn which treatments work. To build good comparison groups that isolate the average effect of treatment from confounding factors, randomization is key, yet often infeasible. In such non-experimental settings, we illustrate and discuss diagnostics to assess how well the common linear regression approach to causal inference approximates desirable features of randomized experiments, such as covariate balance, study representativeness, interpolated estimation, and unweighted analyses. We also discuss alternative regression modeling, weighting, and matching approaches and argue they should be given strong consideration in empirical work.

Multivariate Analysis: Causation, Control, and Conditionality

Springer eBooks, 2022

Theory building and data analyses based on three or more variables offer many possibilities for refinement and increased accuracy beyond what has been discussed in Chaps. 2 and 3. One of these involves "causal inference." We know that a correlation between two variables, even a strong and statistically significant correlation-a correlation that justifies risking a Type I error-does not provide evidence that the relationship between the two variables involves causality. The distinction between a correlation and a causal connection is sometimes illustrated by silly, but humorous, examples. Here is one that we heard in the U.S. a few years ago. Popular folktales pretend that newborn babies are brought to waiting parents by a stork. The image of a baby in a blanket hanging from the stork's beak is familiar, at least in the U.S. Of course, storks do not really deliver babies; but wait, it turns out that there is a strong and significant correlation across a sample of geographic localities between the presence of storks and a relatively high number of babies born each year. Does this mean that we should not be so quick to dismiss the story in the folktale? Of course not. The correlation does not indicate a causal connection. It reflects the impact of a third variable, and that third variable is probably whether or not a locality is urban or rural. Birth rates are higher in localities that are more rural, and storks are more likely to be found in rural localities. Thus, whether or not there are both more babies and more storks, or fewer of each, depends on whether the locality is urban or rural. It is the impact of this third variable, rather than a causal relationship between the original two, that causes a measure of storks and a measure of babies to covary. Well, maybe this folktale is not so humorous, after all! At least it is silly! There are many silly examples of things that covary but do not involve a causal relationship. Consider another example: wearing shorts and eating ice cream covary. Is it

REGRESSION ANALYSIS AND RELEVANCE TO RESEARCH IN SOCIAL SCIENCES

Academic Journal of Accounting and Business Management, 2021

The study seeks to review regression analysis and its relevance to research in social sciences, the study relied on a review of various regression analyses being used in social sciences and the significance of regression analysis as a tool in the analysis of data sets. The study adopted a systematic exploratory research design, reviewing related articles, journals, and other prior studies in relation to regression analysis and its relevance in social sciences. After a careful systematic and contextual review, the study revealed that regression analysis is significant in providing a measure of coefficients of the determination which explains the effect of the independent variable (explanatory variable) on the explained variable otherwise known as regressed variables that give the idea of the prediction values of the regression analysis. Regression analysis provides a practical and strong tool for statistical analysis that can enhance investment decisions, business projections in manufacturing, production, stock price movement, sales, and revenue estimations, and generally in making future predictions. This review provides originality in a clear understanding of a comprehensive review of the relevance of regression analysis in social sciences, contributing to knowledge in this regard. The study recommends that researchers should adopt the required pragmatic and methodological steps when using regression analysis, unethical torturing of data should be avoided as this could lead to false results and wrong statistical predictions.