Feature Selection to Optimize Credit Banking Risk Evaluation Decisions for the Example of Home Equity Loans (original) (raw)
Related papers
An effective credit scoring model based on feature selection approaches
— Recent finance and debt crises have made credit risk management one of the most important issues in financial research. Credit scoring is one of the most important issues in financial decision-making. Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics. In this paper, we propose an effective credit scoring model based on feature selection approaches. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. Using the standard k-nearest-neighbors (kNN) rule as the classification algorithm, the feature selection methods are evaluated in classification tasks. Two well-known and readily available such as: Australia and German dataset has been used to test the algorithm. The results obtained by feature selection approaches shown have been superior to state-of-the-art classification algorithms in credit scoring.
Feature Selection in a Credit Scoring Model
Mathematics, 2021
This paper proposes different classification algorithms-logistic regression, support vector machine, K-nearest neighbors, and random forest-in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.
On Feature Selection for Credit Scoring
Credits' granting is a fundamental question for which every credit institution is confronted and one of the most complex tasks that it has to deal with. This task is based on analyzing and judging a large amount of receipts credits' requests. Typically, credit scoring databases are often large and characterized by redundant and irrelevant features. With so many features, classification methods become more computational demanding. This difficulty can be solved by using feature selection methods. Many such methods are proposed in literature such as filter and wrapper methods. Filter methods select the best features by evaluating the fundamental properties of data, making them fast and simple to implement. However, they are sensitive to redundancy and there are so many filtering methods proposed in previous work leading to the selection trouble. Wrapper methods select the best features according to the classifier's accuracy, making results well-matched to the predetermined ...
Feature selection in credit risk modeling: an international evidence
Economic Research-Ekonomska Istraživanja, 2021
This paper aims to discover a suitable combination of contemporary feature selection techniques and robust prediction classifiers. As such, to examine the impact of the feature selection method on classifier performance, we use two Chinese and three other real-world credit scoring datasets. The utilized feature selection methods are the least absolute shrinkage and selection operator (LASSO), multivariate adaptive regression splines (MARS). In contrast, the examined classifiers are the classification and regression trees (CART), logistic regression (LR), artificial neural network (ANN), and support vector machines (SVM). Empirical findings confirm that LASSO's feature selection method, followed by robust classifier SVM, demonstrates remarkable improvement and outperforms other competitive classifiers. Moreover, ANN also offers improved accuracy with feature selection methods; LR only can improve classification efficiency through performing feature selection via LASSO. Nonetheless, CART does not provide any indication of improvement in any combination. The proposed credit scoring modeling strategy may use to develop policy, progressive ideas, operational guidelines for effective credit risk management of lending, and other financial institutions. The finding of this study has practical value, as to date, there is no consensus about the combination of feature selection method and prediction classifiers.
A Hybrid Approach for Feature Selection in Data Mining Modeling of Credit Scoring
2020
Recent year researches shows that data mining techniques can be implemented in broad areas of the economy and, in particular, in the banking sector. One of the most burning issues banks face is the problem of non-repayment of loans by the population that related to credit scoring problem. The main goal of this paper is to show the importance of applying feature selection in data mining modeling of credit scoring. The study shows processes of data pre-processing, feature creation and feature selection that can be applicable in real-life business situations for binary classification problems by using nodes from IBM SPSS Modeler. Results have proved that application of hybrid model of feature selection, which allows to obtain the optimal number of features, conduces in credit scoring accuracy increase. Proposed hybrid model comparing to expert judgmental approach performs in harder explanation but shows better accuracy and flexibility of factors selection which is advantage in fast cha...
A FRAMEWORK FOR FEATURE SELECTION USING XGBOOST FOR PREDICTION BANKING RISK
machine learning methods have become one of the dominant approaches in an effort To find accurate predictions. Given the presence of large quantities of high dimensional data (which may come in a variety of noisy forms) and the lack of a comprehensive understanding at the molecular level, mining microarray data present strong challenges in Dealing with large amounts of data. To account for these challenges, Related Work has proposed different methods for selecting a Feature selection that can be used as an accurate Predictor, Feature selection is a very important pre-processing step of the data to discover the parts that help in the accuracy of the predictive performance of the model. This study's goal is a framework for feature selection using (XG-Boost, Linear Regression and Logistic Regression) for banking risk prediction. 350 examples and 19 characteristics make up the available Kaggle datasets that we used. The data were analyzed using SPSS, WEKA and python program. Based on our findings, this model exhibits greater effectiveness when compared to traditional feature selection methods. According to our results, XGBoost demonstrated superior discrimination capabilities when compared to alternative feature selection methods, namely Linear Regression and Logistic Regression, as evaluated by the AUC metric for forecast accuracy. The respective accuracy rates achieved by XGBoost and the two other methods were 91.43%, 89.12%, and 87%.
Filter- versus wrapper-based feature selection for credit scoring
International Journal of Intelligent Systems, 2005
We address the problem of credit scoring as a classification and feature subset selection problem. Based on the current framework of sophisticated feature selection methods, we identify features that contain the most relevant information to distinguish good loan payers from bad loan payers. The feature selection methods are validated on several real world datasets with different types of classifiers. We show the advantages following from using the sub-space approach to classification. We discuss many practical issues related to the applicability of feature selection methods. We show and discuss some difficulties that use to be insufficiently emphasised in standard feature selection literature.
A Three-Stage Feature Selection Using Quadratic Programming for Credit Scoring
Applied Artificial Intelligence, 2013
& Many classification techniques have been successfully applied to credit scoring tasks. However, using them blindly may lead to unsatisfactory results. Generally, credit datasets are large and are characterized by redundant features and nonrelevant data. Hence, classification techniques and model accuracy could be hampered. To overcome this problem, this study explores a variety of filter and wrapper feature selection methods for reducing nonrelevant features. We argue that these two types of selection techniques are complementary to each other. A fusion strategy is then proposed to sequentially combine the ranking criteria of multiple filters and a wrapper method. Evaluations on three credit datasets show that feature subsets selected by fusion methods are either superior to or at least as adequate as those selected by individual methods.
2022 10th International Conference on Information and Communication Technology (ICoICT), 2022
Machine learning has evolved as a multidisciplinary study in the last few years and gains more popularity in big data analytics, including in the banking industry. Numerous methods can be used in predictive analytics through supervised machine learning, either for regression or classification problems. In the banking industry, credit quality is one of the core focuses, since it is one of the main areas that is reviewed regularly by regulators and impacts banks' profitability. This research is intended to give recommendations on how to select appropriate machine learning technique, perform feature selection and sensitivity analysis on bank's credit data with more than one million records and highly imbalanced, i.e., 97.5% of data is at one category. By using several supervised machine learning classification methods including the application of SMOTE (synthetic minority oversampling technique), computational results are compared and summarized, resulting in recommendations on the most appropriate technique for big and extremely imbalanced datasets, i.e., the Tree Ensemble method with SMOTE, with the computational issue is solved through data sampling, without significantly reducing its accuracy. It is also concluded that optimum number of features will increase model accuracy, however significant reduction of number of features will not necessarily increase model accuracy. The research is expected to be useful for the banking industry, especially in credit portfolio analytics, or other industries with a big and imbalanced dataset, to perform predictive analytics to support business objectives. Further research is possible, to cover more in-depth analytics for the decision-making process in banking.
The Mahalanobis-Taguchi System (MTS) is a relatively new collection of methods proposed for diagnosis and forecasting using multivariate data. It consists of two main parts: Part 1, the selection of useful variables in order to reduce the complexity of multi-dimensional systems and part 2, diagnosis and prediction, which are used to predict the abnormal group according to the remaining useful variables. The main purpose of this research is presenting a new method to select useful variables by using and combining the concept of Mahalanobis distance and Integer Programming. Due to the inaccuracy and the difficulties in selecting the useful variables by the design of experiments method, we have used an innovative and accurate method to solve the problem. The proposed model finds the solutions faster and has a better performance than other common methods.