A Hybrid Approach for Feature Selection in Data Mining Modeling of Credit Scoring (original) (raw)
Related papers
An effective credit scoring model based on feature selection approaches
— Recent finance and debt crises have made credit risk management one of the most important issues in financial research. Credit scoring is one of the most important issues in financial decision-making. Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics. In this paper, we propose an effective credit scoring model based on feature selection approaches. Feature selection is a process of selecting a subset of relevant features, which can decrease the dimensionality, shorten the running time, and/or improve the classification accuracy. Using the standard k-nearest-neighbors (kNN) rule as the classification algorithm, the feature selection methods are evaluated in classification tasks. Two well-known and readily available such as: Australia and German dataset has been used to test the algorithm. The results obtained by feature selection approaches shown have been superior to state-of-the-art classification algorithms in credit scoring.
Journal of Retailing and Consumer Services, 2015
Data mining techniques have numerous applications in credit scoring of customers in the banking field. One of the most popular data mining techniques is the classification method. Previous researches have demonstrated that using the feature selection (FS) algorithms and ensemble classifiers can improve the banks' performance in credit scoring problems. In this domain, the main issue is the simultaneous and the hybrid utilization of several FS and ensemble learning classification algorithms with respect to their parameters setting, in order to achieve a higher performance in the proposed model. As a result, the present paper has developed a hybrid data mining model of feature selection and ensemble learning classification algorithms on the basis of three stages. The first stage, as expected, deals with the data gathering and pre-processing. In the second stage, four FS algorithms are employed, including principal component analysis (PCA), genetic algorithm (GA), information gain ratio, and relief attribute evaluation function. In here, parameters setting of FS methods is based on the classification accuracy resulted from the implementation of the support vector machine (SVM) classification algorithm. After choosing the appropriate model for each selected feature, they are applied to the base and ensemble classification algorithms. In this stage, the best FS algorithm with its parameters setting is indicated for the modeling stage of the proposed model. In the third stage, the classification algorithms are employed for the dataset prepared from each FS algorithm. The results exhibited that in the second stage, PCA algorithm is the best FS algorithm. In the third stage, the classification results showed that the artificial neural network (ANN) adaptive boosting (AdaBoost) method has higher classification accuracy. Ultimately, the paper verified and proposed the hybrid model as an operative and strong model for performing credit scoring.
Feature Selection in a Credit Scoring Model
Mathematics, 2021
This paper proposes different classification algorithms-logistic regression, support vector machine, K-nearest neighbors, and random forest-in order to identify which candidates are likely to default for a credit scoring model. Three different feature selection methods are used in order to mitigate the overfitting in the curse of dimensionality of these classification algorithms: one filter method (Chi-squared test and correlation coefficients) and two wrapper methods (forward stepwise selection and backward stepwise selection). The performances of these three methods are discussed using two measures, the mean absolute error and the number of selected features. The methodology is applied for a valuable database of Taiwan. The results suggest that forward stepwise selection yields superior performance in each one of the classification algorithms used. The conclusions obtained are related to those in the literature, and their managerial implications are analyzed.
Advanced Engineering Informatics, 2020
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models.
On Feature Selection for Credit Scoring
Credits' granting is a fundamental question for which every credit institution is confronted and one of the most complex tasks that it has to deal with. This task is based on analyzing and judging a large amount of receipts credits' requests. Typically, credit scoring databases are often large and characterized by redundant and irrelevant features. With so many features, classification methods become more computational demanding. This difficulty can be solved by using feature selection methods. Many such methods are proposed in literature such as filter and wrapper methods. Filter methods select the best features by evaluating the fundamental properties of data, making them fast and simple to implement. However, they are sensitive to redundancy and there are so many filtering methods proposed in previous work leading to the selection trouble. Wrapper methods select the best features according to the classifier's accuracy, making results well-matched to the predetermined ...
Today`s financial transactions have been increased through banks and financial institutions. Therefore, credit scoring is a critical task to forecast the customers’ credit. We have created 9 different models for the credit scoring by combining three methods of feature selection and three decision tree algorithms. The models are implemented on three datasets and then the accuracy of the models is compared. The two datasets are chosen from the UCI (Australian dataset, German dataset) and a given dataset is considered a Car Leasing Company in Iran. Results show that using feature selection methods with decision tree algorithms (hybrid models) make more accurate models than models without feature selection.
Credit scoring in banks and financial institutions via data mining techniques: A literature review
Journal of AI and Data Mining, 2013
This paper presents a comprehensive review of the studies conducted in the application of data mining techniques focus on credit scoring from 2000 to 2012. Yet, there isn‟t adequate literature reviews in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct online journal database. The studies are categorized and classified into enterprise, individual and small and midsized (SME) companies credit scoring. Data mining techniques are also categorized to single classifier, Hybrid methods and Ensembles. Variable selection methods are also investigated separately because there is a major issue in a credit scoring problem. The findings of this literature review reveals that data mining techniques are mostly applied to an individual credit score and there is inadequate research on enterprise and SME credit scoring. Also ensemble method...
A Novel Hybrid Data Mining Framework for Credit Evaluation
2016
Internet loan business has received extensive attentions recently. How to provide lenders with accurate credit scoring profiles of borrowers becomes a challenge due to the tremendous amount of loan requests and the limited information of borrowers. However, existing approaches are not suitable to Internet loan business due to the unique features of individual credit data. In this paper, we propose a unified data mining framework consisting of feature transformation, feature selection and hybrid model to solve the above challenges. Extensive experiment results on realistic datasets show that our proposed framework is an effective solution.
A Proposed Classification of Data Mining Techniques in Credit Scoring
Credit scoring has become very important issue due to the recent growth of the credit industry, so the credit department of the bank faces a large amount of credit data. Clearly it is impossible analyzing this huge amount of data both in economic and manpower terms, so data mining techniques were employed for this purpose. So far many data mining methods are proposed to handle credit scoring problems that each of them, has some prominences and limitations than the others, but there is no a comprehensive reference introducing most used data mining method in credit scoring problem. The aim of this study is providing a comprehensive literature survey related to applied data mining techniques in credit scoring context. Such reference can help the researchers to be aware of most common methods in credit scoring evaluation, find their limitations, improve them and suggest new method with better capabilities. At the end we notice the limitation of the most proposed methods and suggest the more applicable method than other proposed.
Analysis of Distinct Feature Groups in the Credit Scoring Problem
J. Inf. Data Manag., 2021
Registration and financial data have been traditionally used for the credit scoring problem. However,slight improvements in the reliability of the scores positively impacts financial companies. Therefore, exploring newfeatures is a strategic task. This work analyzes the importance of new feature groups not commonly employed forthe credit scoring task and others already used. We categorized features from open credit scoring datasets, suchas German and Australian and compared their groups with the ones of a company dataset used in this work. Ourdataset contains unusual feature groups, such as historical, geolocation, web behavior, and demographic data. In ouranalyzes, we first conducted bivariate tests with each feature-pair to assess their individual importance. Secondly, weran XGBoost machine learning model with each feature group to evaluate each group importance. We also appliedfeature selection with binary Particle Swarm Optimization to assess the groups importance when combined....