Determining Predictor Importance In Multiple Regression Under Varied Correlational And Distributional Conditions (original) (raw)
Related papers
Psychological Bulletin, 1993
Whenever multiple regression is used to test and compare theoretically motivated models, it is of interest to determine the relative importance of the predictors. Specifically, researchers seek to rank order and scale variables in terms of their importance and to express global statistics of the model as a function of these measures. This article reviews the many meanings of importance of predictors in multiple regression, highlights their weaknesses, and proposes a new method for comparing variables: dominance analysis. Dominance is a qualitative relation denned in a pairwise fashion: One variable is said to dominate another if it is more useful than its competitor in all subset regressions. Properties of the newly proposed method are described and illustrated. The Problem Consider a univariate multiple regression model in which, in a certain population, a single criterion, y, is described in terms This article benefited from comments provided by numerous colleagues. I wish to thank
2022
Researchers often make claims regarding the importance of predictor variables in multiple regression analysis by comparing standardized regression coefficients (standardized beta coefficients). This practice has been criticized as a misuse of multiple regression analysis. As a remedy, I highlight the use of dominance analysis and random forest, a machine learning technique, in this method showcase article to accurately determine predictor importance in multiple regression analysis. To demonstrate the utility of dominance analysis and random forest, I reproduced the results of an empirical study and applied these analytical procedures. The results reconfirmed that multiple regression analysis should always be accompanied by dominance analysis and random forest to identify the unique contribution of individual predictors while considering correlations among predictors. A web application to facilitate the use of dominance analysis and random forest among second language researchers is also introduced.
The Dominance Analysis Approach for Comparing Predictors in Multiple Regression
Psychological Methods, 2003
A general method is presented for comparing the relative importance of predictors in multiple regression. Dominance analysis , a procedure that is based on an examination of the R 2 values for all possible subset models, is refined and extended by introducing several quantitative measures of dominance that differ in the strictness of the dominance definition. These are shown to be intuitive, meaningful, and informative measures that can address a variety of research questions pertaining to predictor importance. The bootstrap is used to assess the stability of dominance results across repeated sampling, and it is shown that these methods provide the researcher with more insights into the pattern of importance in a set of predictors than were previously available.
On Variable Importance in Linear Regression
1998
The paper examines in detail one particular measure of variable importance for linear regression that was theoretically justified by Pratt (1987), but which has since been criticized by Bring (1996) for producing "counterintuitive" results in certain situations, and by other authors for failing to guarantee that importance be non-negative. In the article, the "counterintuitive" result is explored and shown to be a defensible characteristic of an importance measure. It is also shown that negative importance of large magnitude can only occur in the presence of multicollinearity of the explanatory variables, and methods for applying Pratt's measure in such cases are described. The objective of the article is to explain and to clarify the characteristics of Pratt's measure, and thus to assist practitioners who have to choose from among the many methods available for assessing variable importance in linear regression.
Organizational Research Methods, 2014
Determining independent variable relative importance is a highly useful practice in organizational science. Whereas techniques to determine independent variable importance are available for normally distributed and binary dependent variable models, such techniques have not been extended to multicategory dependent variables (MCDVs). The current work extends previous research on binary dependent variable relative importance analysis to provide a methodology for conducting relative importance analysis on MCDV models from a dominance analysis (DA) perspective. Moreover, the current work provides a set of comprehensive data analytic examples that demonstrate how and when to use MCDV models in a DA and the advantages general DA statistics offer in interpreting MCDV model results. Moreover, the current work outlines best practices for determining independent variable relative importance for MCDVs using replicable examples on data from the publicly available General Social Survey. The present work then contributes to the literature by using in-depth data analytic examples to outline best practices in conducting relative importance analysis for MCDV models and by highlighting unique information DA results provide about MCDV models.
Interpreting Multiple Linear Regression: A Guidebook of Variable Importance
Multiple regression (MR) analyses are commonly employed in social science fields. It is also common for interpretation of results to typically reflect overreliance on beta weights (cf. Courville & Thompson, 2001; Nimon, Roberts, & Gavrilova, 2010; Zientek, Capraro, & Capraro, 2008), often resulting in very limited interpretations of variable importance. It appears that few researchers employ other methods to obtain a fuller understanding of what and how independent variables contribute to a regression equation.
Behavior Research Methods, 2011
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
The Importance of Variable Importance
arXiv (Cornell University), 2022
The Importance of Variable Importance Variable importance is defined as a measure of each regressor's contribution to model fit. Using 2 as the fit criterion in linear models leads to the Shapley value (LMG) and proportionate value (PMVD) as variable importance measures. Similar measures are defined for ensemble models, using random forests as the example. The properties of the LMG and PMVD are compared. Variable importance is proposed to assess regressors' practical effects or "oomph." The uses of variable importance in modelling, interventions and causal analysis are discussed.