Using linear regression and ANN techniques in determining variable importance (original) (raw)

A comparison of methods for assessing the relative importance of input variables in artificial neural networks

Journal of Applied Sciences Research, 2013

Artificial neural networks are considering powerful statistical modeling technique in the agricultural sciences; however, they provide little information about the contributions of the independent variables in the prediction process. The goal of relative importance analysis is to partition explained variance among multiple predictors to better understand the role played by each predictor. In the present study, a modification to Connection Weights Algorithm and a novel algorithm are proposed to assess the relative importance of independent variables in multilayer perceptron neural network and a comparison in the field of crop production with the Connection Weights Algorithm, Dominance Analysis, Garson's Algorithm, Partial Derivatives, and Multiple Linear Regression is presented. The performance of the two proposed algorithms is studied for empirical data. The Most Squares method (the second proposed algorithm) is found to be a better method in comparison to the above mentioned methods and seem to perform much better than the other methods, and agree with the results of multiple linear regressions in terms of the partial R 2 and consequently, it seem to be more reliable.

Neural Network Studies. 2. Variable Selection

Journal of Chemical Information and Modeling, 1996

Quantitative structure-activity relationship (QSAR) studies usually require an estimation of the relevance of a very large set of initial variables. Determination of the most important variables allows theoretically a better generalization by all pattern recognition methods. This study introduces and investigates five pruning algorithms designed to estimate the importance of input variables in feed-forward artificial neural network trained by back propagation algorithm (ANN) applications and to prune nonrelevant ones in a statistically reliable way. The analyzed algorithms performed similar variable estimations for simulated data sets, but differences were detected for real QSAR examples. Improvement of ANN prediction ability was shown after the pruning of redundant input variables. The statistical coefficients computed by ANNs for QSAR examples were better than those of multiple linear regression. Restrictions of the proposed algorithms and the potential use of ANNs are discussed.

A visual method for determining variable importance in an artificial neural network model: An empirical benchmark study

Journal of Targeting, Measurement and Analysis for Marketing, 2003

as a prominent marketing support tool, ANNs are criticised for their failure to explain results. Central to this criticism is their inability to provide interpretation of the network connection weights. The purpose of this study is to demonstrate a methodology for ANN model variable interpretation that uses network connection weights. Empirical marketing data are used to optimise an ANN and a multinomial logit (MNL) model. Response elasticity graphs are built for each ANN model variable by plotting the derivative of the network output with respect to each variable, while

An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data

Ecological Modelling, 2004

Artificial neural networks are receiving greater attention in the ecological sciences as a powerful statistical modeling technique; however, they have also been labeled a "black box" because they are believed to provide little explanatory insight into the contributions of the independent variables in the prediction process. In a recent issue of Ecological Modelling Gevrey et al. (2003) addressed this concern by providing a comprehensive comparison of 8 different methodologies for estimating variable importance in neural networks that are commonly used in ecology. Unfortunately, comparisons of the different methodologies were based on an empirical dataset, which precludes the ability to establish generalizations regarding the true accuracy and precision of the different approaches because the true importance of the variables is unknown. Here, we provide a more appropriate comparison of the different methodologies by using Monte Carlo simulations with data exhibiting defined (and consequently known) numeric relationships. Our results show that a Connection Weight approach that uses raw input-hidden and hiddenoutput connection weights in the neural network provides the best methodology for accurately quantifying variable importance and should be favoured over the other approaches commonly used in the ecological literature. Average similarity between true and estimated ranked variable importance using this approach was 0.92, whereas similarity coefficients ranged between 0.28 and 0.74 for the other approaches. Furthermore, the Connection Weight approach was the only method that consistently identified the correct ranked importance of all predictor variables, whereas the other methods either only identified the first few important variables in the network or no variables at all. The most notably result was that Garson's Algorithm was the poorest performing approach, yet is the most commonly used in the ecological literature. In conclusion, this study provides a robust comparison of different methodologies for assessing variable Olden, Joy and Death: Assessing variable contributions in neural networks 3 importance in neural networks that can be generalized to other data and from which valid recommendations can be made for future studies.

An approach for determining relative input parameter importance and significance in artificial neural networks

Ecological Modelling, 2007

Parameter importance Virtual ecology a b s t r a c t Artificial neural network (ANN) models are powerful statistical tools which are increasingly used in modeling complex ecological systems. For interpretation of ANN models, a means of evaluating how systemic parameters contribute to model output is essential. Developing a robust, systematic method for interpreting ANN models is the subject of much current research. We propose a method using sequential randomization of input parameters to determine the relative proportion to which each input variable contributes to the predictive ability of the ANN model (termed the holdback input randomization method or HIPR method). Validity of the method was assessed using a simulated data set in which the relationship between input parameters and output parameters were completely known.

Variable Importance Analysis in Default Prediction using Machine Learning Techniques

International Conference on Data Technologies and Applications, 2018

In this study, different data mining techniques were applied to a finance credit data set from a financial institution to provide an automated and objective profitability measurement. Two-step methodology was used Determining the variables to be included in the model and deciding on the model to classify the potential credit application as "bad credit (default)" or "good credit (not default)". The phrases "bad credit" and "good credit" are used as class labels since they are used like this in financial sector jargon in Turkey. For this twostep procedure, different variable selection algorithms like Random Forest, Boruta and machine learning algorithms like Logistic Regression, Random Forest, Artificial Neural Network were tried. At the end of the feature selection phase, CRA and III variables were determined as most important variables. Moreover, occupation and product number were also predictor variables. For the classification phase, Neural Network model was the best model with higher accuracy and low average square error also Random Forest model better resulted than Logistic Regression model.

On Variable Importance in Linear Regression

1998

The paper examines in detail one particular measure of variable importance for linear regression that was theoretically justified by Pratt (1987), but which has since been criticized by Bring (1996) for producing "counterintuitive" results in certain situations, and by other authors for failing to guarantee that importance be non-negative. In the article, the "counterintuitive" result is explored and shown to be a defensible characteristic of an importance measure. It is also shown that negative importance of large magnitude can only occur in the presence of multicollinearity of the explanatory variables, and methods for applying Pratt's measure in such cases are described. The objective of the article is to explain and to clarify the characteristics of Pratt's measure, and thus to assist practitioners who have to choose from among the many methods available for assessing variable importance in linear regression.

The Use of Artificial Neural Networks for Quantifying the Relative Importance of the Firms’ Performance Determinants

2017

Performance is the outcome of all plans and decisions of a company. It shows the ways companies are governed. Consequently, determining the relative importance of factors influencing the Performance is important. Therefore, in this study, seven independent variables were determined based on the literature. Then, the significant variables were chosen using the Pearson’s correlation test. Finally, an artificial neural network was designed to investigate the relative importance of the determinants. In total, 1340 company-year data were collected from Tehran Stock Exchange (TSE) from 2001 to 2010. The research results revealed that institutional ownership concentration is the most important factor which is followed by state ownership, and managerial stock ownership. Debt policy and firm size are ranked in lower position.

BUILDING A MODEL FOR PREDICTING PRODUCTIVITY AND EVALUATING FACTORS AFFECTING PRODUCTIVITY BY USING ARTIFICIAL NEURAL NETWORKS

Construction productivity is the main indicator of the performance of construction projects for any country. The productivity of construction projects is defined as the output of the system for each unit of input. The main objective of this paper is to identify and analyze factors affecting labor productivity in construction projects in Iraq through a closed questionnaire. While the second objective of this study is to construct a mathematical model for predicting the construction productivity for Formwork columns works. As well as, the assessment of factors affecting productivity using sensitivity analysis (Garson algorithm). After distributing, gathering and analysis the questionnaire and finding the relatve importance index (RII%) using the Likert scale for each influencing factor, the top ten factors were determined that effect on the productivity rate. As these factors are independent inputs of the model and affect the one output "dependent" is productivity rate. These factors are classified on the basis of the values of the relative importance index (RII%) calculated for each factor. Finally, the data were used in artificial neural networks (ANN) development of the prediction model. It was found that (ANN) have the ability to predict the Total productivity rate for formwork columns works for building project with a good degree of accuracy of the coefficient of correlation (R) was (94.39%) and average accuracy percentage of (AA%) was (85.45%). While, The sensitivity analysis indicated the following, The (V6) (Lack of labor surveillance) is ranked first with a relative importance of (27.61%). Contrast, The (V5) (The Ganger experience) has a low importance in the model with a relative importance around (0.35%) and it is ranked eleven.