Relevance, prediction and interpretation for linear models with many predictors (original) (raw)

A Comparison Study of Ridge Regression and Principle Component Regression with Application

International Journal of Research, 2016

The purpose of this paper is to discuss the multicollinearity problem in regression models and presents some typical ways of handling the collinearity problem. In Addition, the paper attempts to compare RR , and PCR and LS methods using minimum squared error MSE and the accuracy of the prediction. The results of this paper showed that, RR method performs better than PCR and LS methods , because RR had minimum MSE and a higher predicted accuracy than other methods. The results of this paper showed that, based on the criteria of model accuracy PCR performs better than RR, whereas, according to mean squares errors criterion MSE , RR performs slightly better. In general, the two biased estimator RR and PCR perform better than LS. Keywords : Least Squares; Correlation Matrix; Multicollinearity; Ridge Regression; Principal Component Regression.

A regression modeling approach with latent predictors∗

2015

Applications having the aim of studying the linear relationship between several dependent (responses) variables y’s and several independent (explanatory, predictive) variable x’s have been considered in various disciplines (bioinformatics, brain imaging, data mining, genomics and economics). The traditional approach to accommodate this issue is the Ordinary Least Square Regression. However, it is not always a good choice when a large number of explanatory variables is available. In such respect, in fact, not only we can encounter difficulties in interpretation of the (many) regression coefficients, but also the multivariate prediction can be affected by the presence of high correlations between explanatory variables which poses problems because influences the stability and the prediction can be unreliable. In the specialized literature many ideas and strategies for dealing with such problems have been proposed such as standard variable selection methods, penalized (or shrinkage) tec...

Comparison of partial least squares with other prediction methods via generated data

Journal of Statistical Computation and Simulation, 2020

The purpose of this study is to compare the Partial Least Squares (PLS), Ridge Regression (RR) and Principal Components Regression (PCR) methods, used to fit regressors with severe multicollinearity against a dependent variable. To realize this, a great number of varying groups of datasets are generated from standard normal distribution allowing for the inclusion of different degrees of collinearities for 10000 replications. The design of the study is based on a simulation work that has been performed for six different degrees of multicollinearity levels and sample sizes. From the generated data, a comparison is made using the value of mean squares error of the regression parameters. The findings show that each prediction method is affected by the sample size, number of regressors or multicollinearity level. However, in contrast to literature (say n <= 200), whatever the number of regressors is, PCR had significantly better results compared to the other two.

On Model Selection Criteria of Multivariate Ridge Regression and Ordinary Least Square Regression

Ridge regression is a regression technique that allows for biased estimation of regression parameters that are quite close to the true values in the presence of correlated predictor variables in the model. Therefore, the paper highlight the introduction of ridge regression estimator as an alternative to ordinary least square (OLS) estimator in the presence of multicollinearity and interpretation of model selection criteria in the presence of multicollinearity. In the research, the method focused on application of the ridge regression model to an economic data. Monte Carlo simulations has been made from multivariate normal to compare their characteristics and performance of the model. The analysis shows that based on the mean square errors, and the variance of each model we discovered that all ridge regression models are more effective and better than ordinary least square when the multicollinearity problem is exist. We therefore concluded the best model is the ridge regression because it has smaller MSE and has a smaller variance.

A note on ridge regression modeling techniques

In this study, the techniques of ridge regression model as alternative to the classical ordinary least square (OLS) method in the presence of correlated predictors were investigated. One of the basic steps for fitting efficient ridge regression models require that the predictor variables be scaled to unit lengths or to have zero means and unit standard deviations prior to parameters' estimations. This was meant to achieve stable and efficient estimates of the parameters in the presence of multicollinearity in the data. However, despite the benefits of this variable transformation on ridge estimators, many published works on ridge regression practically ignored it in their parameters' estimations. This work therefore examined the impacts of scaled collinear predictor variables on ridge regression estimators. Various results from simulation studies underscored the practical importance of scaling the predictor variables while fitting ridge regression models. A real life data set on import activities in the French economy was employed to validate the results from the simulation studies.

Latent Structure Linear Regression

Applied Mathematics, 2014

A short review is given of standard regression analysis. It is shown that the results presented by program packages are not always reliable. Here is presented a general framework for linear regression that includes most linear regression methods based on linear algebra. The H-principle of mathematical modelling is presented. It uses the analogy between the modelling task and measurement situation in quantum mechanics. The principle states that the modelling task should be carried out in steps where at each step an optimal balance should be determined between the value of the objective function, the fit, and the associated precision. H-methods are different methods to carry out the modelling task based on recommendations of the H-principle. They have been applied to different types of data. In general, they provide better predictions than linear regression methods in the literature.

A Monte Carlo Comparison between Ridge and Principal Components Regression Methods

Applied Mathematical Sciences

A basic assumption concerned with general linear regression model is that there is no correlation (or no multicollinearity) between the explanatory variables. When this assumption is not satisfied, the least squares estimators have large variances and become unstable and may have a wrong sign. Therefore, we resort to biased regression methods, which stabilize the parameter estimates. Ridge regression (RR) and principal components regression (PCR) are two of the most popular biased regression methods. In this article, we used Monte Carlo experiments to estimate the regression coefficients by RR and PCR methods. A comparison between RR and PCR methods was made in the sense of having smaller mean squares error (MSE). Based on this simulation study, we found that RR method performs better than PCR method.

Application of Latent Roots Regression to Multicollinear Data

Journal of Advance Research in Computer Science & Engineering, 2012

Several applications are based on the assessment of a linear model including a variable y to Predictors x1, x2,..,xp. It often occurs that the predictors are collinear which results in a high instability of the model obtained by means of multiple linear regression using least squares estimation. Several alternative methods have been proposed in order to tackle this problem. Among these methods Ridge Regression, Principal Component Regression .We discuss a third method called Latent Root Regression. This method depends on the Eigen values and Eigen vectors of the matrix A'A ,where A is the matrix of y and x1, x2,..,xp . We introduce some properties of latent root regression which give new insight into the determination of a prediction model. Thus, a model may be determined by combining latent root estimators in such a way that the associated mean squared error is minimized .The method is illustrated using three real data sets. Namely: Economical , Medical and Environmental d...