Feature selection in credit risk modeling: an international evidence (original) (raw)

Feature Selection in a Credit Scoring Model

Mathematics, 2021

This paper proposes different classification algorithms-logistic regression, support vector machine, K-nearest neighbors, and random forest-in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.

An effective credit scoring model based on feature selection approaches

— Recent finance and debt crises have made credit risk management one of the most important issues in financial research. Credit scoring is one of the most important issues in financial decision-making. Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics. In this paper, we propose an effective credit scoring model based on feature selection approaches. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. Using the standard k-nearest-neighbors (kNN) rule as the classification algorithm, the feature selection methods are evaluated in classification tasks. Two well-known and readily available such as: Australia and German dataset has been used to test the algorithm. The results obtained by feature selection approaches shown have been superior to state-of-the-art classification algorithms in credit scoring.

Feature Selection to Optimize Credit Banking Risk Evaluation Decisions for the Example of Home Equity Loans

Mathematics, 2020

Measuring credit risk is essential for financial institutions because there is a high risk level associated with incorrect credit decisions. The Basel II agreement recommended the use of advanced credit scoring methods in order to improve the efficiency of capital allocation. The latest Basel agreement (Basel III) states that the requirements for reserves based on risk have increased. Financial institutions currently have exhaustive datasets regarding their operations; this is a problem that can be addressed by applying a good feature selection method combined with big data techniques for data management. A comparative study of selection techniques is conducted in this work to find the selector that reduces the mean square error and requires the least execution time.

On Feature Selection for Credit Scoring

Credits' granting is a fundamental question for which every credit institution is confronted and one of the most complex tasks that it has to deal with. This task is based on analyzing and judging a large amount of receipts credits' requests. Typically, credit scoring databases are often large and characterized by redundant and irrelevant features. With so many features, classification methods become more computational demanding. This difficulty can be solved by using feature selection methods. Many such methods are proposed in literature such as filter and wrapper methods. Filter methods select the best features by evaluating the fundamental properties of data, making them fast and simple to implement. However, they are sensitive to redundancy and there are so many filtering methods proposed in previous work leading to the selection trouble. Wrapper methods select the best features according to the classifier's accuracy, making results well-matched to the predetermined ...

Filter- versus wrapper-based feature selection for credit scoring

International Journal of Intelligent Systems, 2005

We address the problem of credit scoring as a classification and feature subset selection problem. Based on the current framework of sophisticated feature selection methods, we identify features that contain the most relevant information to distinguish good loan payers from bad loan payers. The feature selection methods are validated on several real world datasets with different types of classifiers. We show the advantages following from using the sub-space approach to classification. We discuss many practical issues related to the applicability of feature selection methods. We show and discuss some difficulties that use to be insufficiently emphasised in standard feature selection literature.

A Hybrid Approach for Feature Selection in Data Mining Modeling of Credit Scoring

2020

Recent year researches shows that data mining techniques can be implemented in broad areas of the economy and, in particular, in the banking sector. One of the most burning issues banks face is the problem of non-repayment of loans by the population that related to credit scoring problem. The main goal of this paper is to show the importance of applying feature selection in data mining modeling of credit scoring. The study shows processes of data pre-processing, feature creation and feature selection that can be applicable in real-life business situations for binary classification problems by using nodes from IBM SPSS Modeler. Results have proved that application of hybrid model of feature selection, which allows to obtain the optimal number of features, conduces in credit scoring accuracy increase. Proposed hybrid model comparing to expert judgmental approach performs in harder explanation but shows better accuracy and flexibility of factors selection which is advantage in fast cha...

Credit scoring using machine learning algorithims

Zimbabwe Journal of Science and Technology, 2018

Credit risk mitigation is an area of renewed interest due to the 2007-2008 financial crises and thus masses of data are collected by the financial institutions. This has left the risk analysts with a daunting task of adequately determining the credit worthiness of an individual. In the search for highly efficient credit scoring models, financial institutions can adopt sophisticated machine learning techniques. We employ the AUROC approach to make a comparative analysis of machine learning methods of classification by performing 10-fold cross validation for model selection on the German Credit data set from the UCI database. The results show that Lasso regression provides the best estimation for default with an AUROC of 0.8048 followed by the Random Forest model with 0.7869 AUROC. The widely used logit model performed better than the Support Vector Machine (Linear) with 0.7678 and 0.7581 AUROC respectively. Moreover, by the Kolmogorov-Smirnov test, we proved that the other machine learning techniques outperform the widely used logit model in how well the model is able to classify "good" class from "bad" class.

Credit Risk Assessment Using Statistical and Machine Learning: Basic Methodology and Risk Modeling Applications

Computational Economics, 2000

Risk assessment of financialintermediaries is an area of renewed interest due tothe financial crises of the 1980's and 90's. Anaccurate estimation of risk, and its use in corporateor global financial risk models, could be translatedinto a more efficient use of resources. One importantingredient to accomplish this goal is to find accuratepredictors of individual risk in the credit portfoliosof institutions. In this context we make a comparativeanalysis of different statistical and machine learningmodeling methods of classification on a mortgage loandata set with the motivation to understand theirlimitations and potential. We introduced a specificmodeling methodology based on the study of errorcurves. Using state-of-the-art modeling techniques webuilt more than 9,000 models as part of the study. Theresults show that CART decision-tree models providethe best estimation for default with an average 8.31%error rate for a training sample of 2,000 records. Asa result of the error curve analysis for this model weconclude that if more data were available,approximately 22,000 records, a potential 7.32% errorrate could be achieved. Neural Networks provided thesecond best results with an average error of 11.00%.The K-Nearest Neighbor algorithm had an averageerror rate of 14.95%. These results outperformed thestandard Probit algorithm which attained an averageerror rate of 15.13%. Finally we discuss thepossibilities to use this type of accurate predictivemodel as ingredients of institutional and global riskmodels.

A Novel Credit Scoring Prediction Model based on Feature Selection Approach and Parallel Random Forest

Background/Objectives: This article presents a method of feature selection to improve the accuracy and the computation speed of credit scoring models. Methods/Analysis: In this paper, we proposed a credit scoring model based on parallel Random Forest classifier and feature selection method to evaluate the credit risks of applicants. By integration of Random Forest into feature selection process, the importance of features can be accurately evaluated to remove irrelevant and redundant features. Findings: In this research, an algorithm to select best features was developed by using the best average and median scores and the lowest standard deviation as the rules of feature scoring. Consequently, the dimension of features can be reduced to the smallest possible number that allows of a remarkable runtime reduction. Thus the proposed model can perform feature selection and model parameters optimization at the same time to improve its efficiency. The performance of our proposed model was experimentally assessed using two public datasets which are Australian and German datasets. The obtained results showed that an improved accuracy of the proposed model compared to other commonly used feature selection methods. In particular, our method can attain the average accuracy of 76.2% with a significantly reduced running time of 72 minutes on German credit dataset and the highest average accuracy of 89.4% with the running time of only 50 minutes on Australian credit dataset. Applications/Improvements: This method can be usefully applied in credit scoring models to improve accuracy with a significantly reduced runtime.

Machine learning predictivity applied to consumer creditworthiness

Future Business Journal, 2020

Credit risk evaluation has a relevant role to financial institutions, since lending may result in real and immediate losses. In particular, default prediction is one of the most challenging activities for managing credit risk. This study analyzes the adequacy of borrower's classification models using a Brazilian bank's loan database, and exploring machine learning techniques. We develop Support Vector Machine, Decision Trees, Bagging, AdaBoost and Random Forest models, and compare their predictive accuracy with a benchmark based on a Logistic Regression model. Comparisons are analyzed based on usual classification performance metrics. Our results show that Random Forest and Adaboost perform better when compared to other models. Moreover, Support Vector Machine models show poor performance using both linear and nonlinear kernels. Our findings suggest that there are value creating opportunities for banks to improve default prediction models by exploring machine learning techniques.