EFN-SMOTE — An improved unbalanced data set oversampling based on fuzzy C-means for Credit Cards Fraud detection (original) (raw)
Related papers
International Journal of Data and Network Science
Credit card fraud poses a significant challenge for both consumers and organizations worldwide, particularly with the increasing reliance on credit cards for financial transactions. Therefore, it is crucial to establish effective mechanisms to detect credit card fraud. However, the uneven distribution of instances between the two classes in the credit card dataset hinders traditional machine learning techniques, as they tend to prioritize the majority class, leading to inaccurate fraud pre- dictions. To address this issue, this paper focuses on the use of the Elbow Fuzzy Noise Filtering SMOTE (EFN-SMOTE) technique, an oversampling approach, to handle unbalanced data. EFN-SMOTE partitions the dataset into multiple clusters using the Elbow method, applies noise filtering to each cluster, and then employs SMOTE to synthesize new minority instances based on the nearest majority instance to each minority instance, thereby improving the model’s ability to perceive the decision boundary. E...
International Journal of Computing and Digital Systems
Credit card fraud has negatively affected the market economic order, broken the confidence and interest of stakeholders, financial institutions, and consumers. Losses from card fraud is increasing every year with billions of dollars being lost. Machine Learning methods use large volumes of data as examples for learning to improve the performance of classification models. Financial institutions use Machine Learning to identify fraudulent patterns from the large amounts of historical financial records. However, the detection of credit card fraud remains as a significant challenge for business intelligence technologies as most datasets containing credit card transactions are highly imbalanced. To overcome this challenge, this paper proposed the use of the data-point approach in machine learning. An experimental study was conducted applying Oversampling with SMOTe, a data-point approach technique, on an imbalanced credit card dataset. State-of-the-art classical machine learning algorithms namely, Support Vector Machines, Logistic Regression, Decision Tree and Random Forest classifiers were used to perform the classifications and the accuracy was evaluated using precision, recall, F1-score, and the average precision metrics. The results show that if the data is highly imbalanced, the model struggles to detect fraudulent transactions. After using the SMOTe based Oversampling technique, there was a significant improvement to the ability to predict positive classes.
Springer eBooks, 2021
This Research presents an innovative approach towards detecting fraudulent credit card transactions. A commonly prevailing yet dominant problem faced in detection of fraudulent credit card transactions is the scarce occurrence of such fraudulent transactions with respect to legitimate (authorized) transactions. Therefore, any data that is recorded will always have a stark imbalance in the number of minority (fraudulent) and majority (legitimate) class samples. This imbalanced distribution of the training data among classes makes it hard for any learning algorithm to learn the features of the minority class. In this thesis work, we analyze the impact of applying class-balancing techniques on the training data namely oversampling (using SMOTE algorithm) for minority class and under sampling (using CMTNN) for majority class. The usage of most popular classification algorithms such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), Logistic Regression (LR), Random Forest (RF) are processed on balanced data and which results to quantify the performance improvement provided by our approach. The experiments show that the hybrid approach which integrates Complementary Neural Network and Synthetic Minority Oversampling Technique gives a Quantitative performance in terms of Accuracy of 99% and 99.7% of AUC with Random Forest Classification Algorithm compared to simple undersampling and oversampling.
International Journal of Information Technology
Credit card fraud is a growing problem nowadays and it has escalated during COVID-19 due to the authorities in many countries requiring people to use cashless transactions. Every year, billions of Euros are lost due to credit card fraud transactions, therefore, fraud detection systems are essential for financial institutions. As the classes' distribution is not equally represented in the credit card dataset, the machine learning trains the model according to the majority class which leads to inaccurate fraud predictions. For that, in this research, we mainly focus on processing unbalanced data by using an undersampling technique to get more accurate and better results with different machine learning algorithms. We propose a framework that is based on clustering the dataset using fuzzy C-means and selecting similar fraud and normal instances that have the same features, which guarantees the integrity between the data features. Keywords Under-sampling technique Á Fuzzy C-means Á Credit card fraud detection Á Machine learning Á Unbalanced dataset Publisher's Note Springer International Publishing.
Handling Class Imbalance in Credit Card Fraud using Resampling Methods
International Journal of Advanced Computer Science and Applications
Credit card based online payments has grown intensely, compelling the financial organisations to implement and continuously improve their fraud detection system. However, credit card fraud dataset is heavily imbalanced and different types of misclassification errors may have different costs and it is essential to control them, to a certain degree, to compromise those errors. Classification techniques are the promising solutions to detect the fraud and non-fraud transactions. Unfortunately, in a certain condition, classification techniques do not perform well when it comes to huge numbers of differences in minority and majority cases. Hence in this study, resampling methods, Random Under Sampling, Random Over Sampling and Synthetic Minority Oversampling Technique, were applied in the credit card dataset to overcome the rare events in the dataset. Then, the three resampled datasets were classified using classification techniques. The performances were measured by their sensitivity, specificity, accuracy, precision, area under curve (AUC) and error rate. The findings disclosed that by resampling the dataset, the models were more practicable, gave better performance and were statistically better.
Handling Class Imbalance in Credit Card Fraud Using Various Sampling Techniques
American Journal of Multidisciplinary Research and Innovation
Over the last few decades, credit card fraud (CCF) has been a severe problem for both cardholders and card providers. Credit card transactions are fast expanding as internet technology advances, significantly relying on the internet. With advanced technology and increased credit card usage, fraud rates are becoming a problem for the economy. However, the credit card dataset is highly imbalanced and skewed. Many classification techniques are used to classify fraud and non-fraud but in a certain condition, they may not generate the best results. Different types of sampling techniques such as under-over sampling, Synthetic Minority Oversampling, and Adaptive synthetic techniques have been used to overcome the class imbalance problem in the credit card dataset. Then, the sampled datasets are classified using different machine learning techniques like Decision Tree, Random Forest, K-Nearest Neighbors, Logistic Regression, and Naive Bayes. Recall, F1- score, accuracy, precision, and error...
Oversampling Techniques in Machine Learning Detection of Credit Card Fraud
Journal of Internet Technology and Secured Transactions
More than ever before, the trend of doing things online has been explored and successfully implemented in many areas, including online shopping, online learning, working online, to name but a few. However, it has brought with it challenges, including the fraudulent use of credit cards in online purchases, the challenge of academic integrity in online learning, especially in doing exams online, and how to keep people in engaged in meetings, when working and studying online, and still give them adequate privacy. This paper deals with the attempt to detect the fraudulent use of credit cards in a timely manner, to avoid as much negative effects in the world of E-commerce and help maintain consumer confidence. Thus, in the current study, machine learning algorithm LightGBM has been used to detect fraudulent credit card transactions from a real-life dataset containing credit card transactions of the customers. The performance of this classifier is compared with two state-of-the-art classifiers-Decision Tree, and Random Forests, which are extensively used for solving such problems. Since there is data imbalance between fraudulent and nonfraudulent class, the data sampling technique used is the Synthetic Minority Oversampling Technique (SMOTE). SMOTE Oversampling performed best on all classifiers and LightGBM obtained precision value of 1 for both fraudulent and non-fraudulent class.
Credit Card Fraud Detection: A Novel Approach to Imbalanced Data Sets in Classification Problems
medium.com, 2024
This study explores novel techniques for addressing the challenges posed by imbalanced datasets in credit card fraud detection using machine learning and deep learning approaches. Leveraging exploratory data analysis (EDA), the study emphasizes the importance of manual interventions in data preprocessing and model training. Specifically, the study proposes a technique that selectively removes rows from the majority class while retaining informative rows with distinguishing features. By preserving class-specific values and pruning common values, the proposed approach aims to enhance classification performance and mitigate dataset imbalance. Experimental results demonstrate that training models on deliberately reduced and more balanced datasets lead to improved predictive accuracy and consistency compared to standard methods. The study underscores the significance of EDA and manual interventions in optimizing fraud detection algorithms and highlights the potential of tailored data preprocessing techniques in handling imbalanced data in classification problems.
An Unbalanced Data Classification Model Using Hybrid Sampling Technique for Fraud Detection
Lecture Notes in Computer Science
Detecting fraud is a challenging task as fraud coexists with the latest in technology. The problem to detect the fraud is that the dataset is unbalanced where non-fraudulent class heavily dominates the fraudulent class. In this work, we considered the fraud detection problem as unbalanced data classification problem and proposed a model based on hybrid sampling technique, which is a combination of random under-sampling and over-sampling using SMOTE. Here, SMOTE is used to widen the data region corresponding to minority samples and random under-sampling of majority class is used for balancing the class distribution. The value difference metric (VDM) is used as distance measure while doing SMOTE. We conducted the experiments with classifiers namely k-NN, Radial Basis Function networks, C4.5 and Naive Bayes with varied levels of SMOTE on insurance fraud dataset. For evaluating the learned classifiers, we have chosen fraud catching rate, non-fraud catching rate in addition to overall accuracy of the classifier as performance measures. Results indicate that our approach produces high predictions against fraud and non-fraud classes.
Classification of datasets is one of the major issues encountered by the data mining community. This problem heightens when the real world datasets is also imbalanced in nature. A dataset happens to be imbalanced when the numbers of observations belonging to rare class are greatly outnumbered by the observations of another class. Class with greater number of observation is called the majority or the negative class, while the other with rare observations is referred to as the minority or the positive class. Literature represents number of resampling techniques that address the problem of class imbalance. One of the most important strategies is to resample the datasets that aim to balance the number of minority or majority observations by over-sampling or under-sampling respectively. This paper aims to investigates and analyze the performance of most widely used oversampling procedure Synthetic Minority Oversampling Technique (SMOTE) for different thresholds of oversampling using four classifiers for three credit scoring datasets.