Feature selection using Bayesian and multiclass Support Vector Machines approaches: Application to bank risk prediction (original) (raw)
Related papers
Comparison of SVM and FSVM for predicting bank failures using chi-square feature selection
Journal of Physics: Conference Series
Bankruptcy doesn't happen suddenly, but there are early indications that can be seen by investigating the financial statement of a bank. In this research, we aim to find the best bankruptcy prediction model to give an early warning for regulators so that it can help them to prevent or lessen the negative effects on economic systems. We will be performing SVM and modification of SVM by adding fuzzy membership function called FSVM to analyze bank's health. We chose machine learning for bankruptcy prediction because it can give faster result rather than traditional statistical method. The prediction accuracy will be measured by using the dataset that consists of 65 Turkish banks of which each of them has an information of 20 financial ratios. Furthermore, to improve the accuracy prediction, we also perform chi-square feature selection (CSFS) to filter any irrelevant features of total 20 features in our dataset. CSFS can sort all 20 features based on chi-square score from the most relevant feature to the least one. After that, we will choose 5, 10, and 15 best features, so that we have four datasets to be classified into healthy and non-healthy banks. We found that using 5 features and SVM classifier gives the highest accuracy prediction, which scores 98.28%. For most cases, SVM gives better performance compared to FSVM.
A discriminating study between three categories of banks based on statistical learning approaches
Intelligent Data Analysis, 2016
This work addresses the problem of knowledge extraction within the banking domain using statistical learning systems. Our main goal is to assess the power of the accounting ratios to discriminate between Islamic, mixed and conventional banks in the Gulf Cooperation Council (GCC) region. To this end, we have used the two popular statistical learning methods, namely Support Vector Machines (SVM) and Random Forests (RF). An intensive comparative study is performed between them for the purpose of variable ranking and selection within a nonlinear multiclass framework. The experiments conducted on different simulated datasets and on the real dataset show that RF are slightly better than SVM. In the real application, we had recourse to the financial semantics based on experts' domain knowledge to decide between the competitive approaches. The results show the importance of the mutual financial information between some ratios to distinguish between the three categories of banks. Moreover, we have demonstrated that mixed banks are more akin to conventional ones. Finally, it was shown that RF are more robust to the selection bias problem and classification accuracy is slightly improved by the ratios selection.
International Journal of Computational Economics and Econometrics, 2013
We propose a Support Vector Machine (SVM)-based structural model to forecast the collapse of banking institutions in the USA using publicly disclosed information from their financial statements on a four-year rolling window. In our approach, the optimum input variable set is defined from a large data set using an iterative relevance-based selection procedure. We train an SVM model to classify banks as solvent and insolvent. The resulting model exhibits significant ability in bank default forecasting.
Mathematics, 2020
Measuring credit risk is essential for financial institutions because there is a high risk level associated with incorrect credit decisions. The Basel II agreement recommended the use of advanced credit scoring methods in order to improve the efficiency of capital allocation. The latest Basel agreement (Basel III) states that the requirements for reserves based on risk have increased. Financial institutions currently have exhaustive datasets regarding their operations; this is a problem that can be addressed by applying a good feature selection method combined with big data techniques for data management. A comparative study of selection techniques is conducted in this work to find the selector that reduces the mean square error and requires the least execution time.
A FRAMEWORK FOR PREDICTION BANKING RISK USING MACHINE LEARNING TECHNIQUES
One of the main challenges facing the banks is to determine the proper bank liquidity. Risk differs widely from bank to bank, and a Careful understanding of various risk factors assists predict the likelihood of expected liquidity based on historical data, Real-world datasets often have missing values, which can cause bias in results. the most widely adopted method for dealing with missing data is to delete observations having missing values, these methods have the disadvantages represented in loss of precision and biased. The purpose of this study is to forecast banks' liquidity risk. We also present a method for dealing with missing data using powerful machine learning methods. we Used available datasets through Kaggle there are 350 cases and 19 characteristics in this dataset. SPSS and the WEKA tool were used to analyze the data. ROC and accuracy were used to assess and compare three classification models (Decision Tree, Support Vector Machine (SVM), and random forest). Results showed that the model obtained acceptably, results The 66-fold(97.47, 97.47, 97.47) respectively (DT, SVM, RF) the best accuracy among from 10-fold.
We propose an Support Vector Machine (SVM) based structural model in order to forecast the collapse of banking institutions in the U.S. using publicly disclosed information from their financial statements on a four-year rolling window. In our approach, the optimum input variable set is defined from a large dataset using an iterative relevance-based selection procedure. We train an SVM model to classify banks as solvent and insolvent. The resulting model exhibits significant ability in bank default forecasting.
Artificial Learning And Support Vector Machines: Default Risk Prediction
2011
The financial crisis that started in 2008 has shown how much work has still to be done in order to precisely predict the bankruptcy of those actors who ask for credit to their banks. In this paper I focus my attention on large companies, using a database kindly provided by Unicredit, one of the most important European banking groups. The size and the complexity of the problem has required the simplification of the database and the use of Principal Component Analysis (PCA) in order to reduce the problem to a dimension that is manageable by the Support Vector Machine (SVM) software chosen for this study. The best configuration found allowed the correct classification of 84% of all companies and such results are found to be higher than many other reported in the literature.
JURTEKSI (Jurnal Teknologi dan Sistem Informasi)
Non-performing loan (NPL) is a risk that credit unions must face and to avoid that, prospective debtors need to be surveyed. With previous loan data, support vector machine and naïve bayes can be used as classification methods to give a decision about NPL. We use a data set with 61 data and process the data with orange 3.30 application to see the difference between SVM using linear (SVM-L), polynomial (SVM-P), RBF (SVM-R) and sigmoid (SVM-S) kernel with naïve bayes. We use a cross validation technique with various folds to measure the classification results and a convusion matrix to measure the data training classification results. Naïve bayes scores the highest in terms of accuracy and SVM-R scores the highest in terms of F1, precision and recall. SVM-P scores the lowest in terms of accuracy, F1, precision and recall. Naïve bayes scores the highest in terms of proportion of predicted for true negative class and proportion of actual for true positive class. SVM-S scores the highest in terms of proportion of predicted for true positive class and proportion of actual for true negative class. SVM-P scores the lowest in both proportion of predicted and proportion of actual.
A FRAMEWORK FOR FEATURE SELECTION USING XGBOOST FOR PREDICTION BANKING RISK
machine learning methods have become one of the dominant approaches in an effort To find accurate predictions. Given the presence of large quantities of high dimensional data (which may come in a variety of noisy forms) and the lack of a comprehensive understanding at the molecular level, mining microarray data present strong challenges in Dealing with large amounts of data. To account for these challenges, Related Work has proposed different methods for selecting a Feature selection that can be used as an accurate Predictor, Feature selection is a very important pre-processing step of the data to discover the parts that help in the accuracy of the predictive performance of the model. This study's goal is a framework for feature selection using (XG-Boost, Linear Regression and Logistic Regression) for banking risk prediction. 350 examples and 19 characteristics make up the available Kaggle datasets that we used. The data were analyzed using SPSS, WEKA and python program. Based on our findings, this model exhibits greater effectiveness when compared to traditional feature selection methods. According to our results, XGBoost demonstrated superior discrimination capabilities when compared to alternative feature selection methods, namely Linear Regression and Logistic Regression, as evaluated by the AUC metric for forecast accuracy. The respective accuracy rates achieved by XGBoost and the two other methods were 91.43%, 89.12%, and 87%.