Fraud Detection Using Random Forest Classifier, Logistic Regression, and Gradient Boosting Classifier Algorithms on Credit Cards (original) (raw)

Credit Card Fraud Detection Using Random Forest Classification

International Journal for Research in Applied Science and Engineering Technology

Credit card fraudulent happens through the account holder's card number, card details and personal information. E-commerce payment system is providing the payment for online transaction. The model is used to identify whether a new transaction is fraudulent or not. Aim is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications. A standard scalar model is initially trained with the normal behavior of a card holder. If an incoming credit card transaction is not accepted by the trained standard scalar model with sufficiently high probability, it is considered to be fraudulent, which defines a plot of test perception as the y coordinate versus its 1-specificity or false positive rate (FPR) as the x coordinate, is an effective method of estimate the quality or performance of diagnostic tests. The significance of the application technique reviewed in the minimization of credit card fraud. Still some issues when genuine credit card customers are misclassified as fraudulent. SMOTE is a statistical technique for increasing the number of cases in your dataset in a balanced way. Random forest builds multiple decision trees and integrate them together to get stable prediction and accuracy of about 98.6%.

Handling Class Imbalance in Credit Card Fraud using Resampling Methods

International Journal of Advanced Computer Science and Applications

Credit card based online payments has grown intensely, compelling the financial organisations to implement and continuously improve their fraud detection system. However, credit card fraud dataset is heavily imbalanced and different types of misclassification errors may have different costs and it is essential to control them, to a certain degree, to compromise those errors. Classification techniques are the promising solutions to detect the fraud and non-fraud transactions. Unfortunately, in a certain condition, classification techniques do not perform well when it comes to huge numbers of differences in minority and majority cases. Hence in this study, resampling methods, Random Under Sampling, Random Over Sampling and Synthetic Minority Oversampling Technique, were applied in the credit card dataset to overcome the rare events in the dataset. Then, the three resampled datasets were classified using classification techniques. The performances were measured by their sensitivity, specificity, accuracy, precision, area under curve (AUC) and error rate. The findings disclosed that by resampling the dataset, the models were more practicable, gave better performance and were statistically better.

A SMOTe based Oversampling Data-Point Approach to Solving the Credit Card Data Imbalance Problem in Financial Fraud Detection

International Journal of Computing and Digital Systems

Credit card fraud has negatively affected the market economic order, broken the confidence and interest of stakeholders, financial institutions, and consumers. Losses from card fraud is increasing every year with billions of dollars being lost. Machine Learning methods use large volumes of data as examples for learning to improve the performance of classification models. Financial institutions use Machine Learning to identify fraudulent patterns from the large amounts of historical financial records. However, the detection of credit card fraud remains as a significant challenge for business intelligence technologies as most datasets containing credit card transactions are highly imbalanced. To overcome this challenge, this paper proposed the use of the data-point approach in machine learning. An experimental study was conducted applying Oversampling with SMOTe, a data-point approach technique, on an imbalanced credit card dataset. State-of-the-art classical machine learning algorithms namely, Support Vector Machines, Logistic Regression, Decision Tree and Random Forest classifiers were used to perform the classifications and the accuracy was evaluated using precision, recall, F1-score, and the average precision metrics. The results show that if the data is highly imbalanced, the model struggles to detect fraudulent transactions. After using the SMOTe based Oversampling technique, there was a significant improvement to the ability to predict positive classes.

Analysis of Credit Card Fraud detection using Machine Learning models on balanced and imbalanced datasets

International Journal of Emerging Trends in Engineering Research, 2021

With the advent of modern transaction technology, many are using online transactions to transfer money from one person to another. Credit Card Fraud, a rising problem in the financial department goes unnoticed most of the time. A lot of research is going on in this area.The Credit Card Fraud Detection project is developed to spot whether a new transaction is fraudulent or not with the knowledge of previous data. We use various predictive models to ascertain how accurate they are in predicting whether a transaction is abnormal or regular. Techniques like Decision Tree, Logistic Regression, SVM and Naïve Bayes are the classification algorithms to detect non-fraud and fraud transactions. In modern conditions where data may vary in a matter of minutes or even seconds, conventional classification techniques may not perform well. When dataset involves huge numbers of differences in data distribution and also changing data with high dimensionality and volume issues supervised learning comes up short. Hence we may resort to unsupervised learning, semi-supervised or any other means to cope with that. The number of online transactions has grown enormously these days and credit card transactions hold an enormous share of these transactions. More numbers of people are using a credit card for shopping, e-commerce, e-wallets and even for education purposes. Therefore, banks and other stakeholders give fraud detection applications priority and value. Fraudulent transactions can be in different categories. They may be through Online or Offline. Our paper deals with the online category and one of many methods to handle them, which is the machine learning way.

Oversampling Techniques in Machine Learning Detection of Credit Card Fraud

Journal of Internet Technology and Secured Transactions

More than ever before, the trend of doing things online has been explored and successfully implemented in many areas, including online shopping, online learning, working online, to name but a few. However, it has brought with it challenges, including the fraudulent use of credit cards in online purchases, the challenge of academic integrity in online learning, especially in doing exams online, and how to keep people in engaged in meetings, when working and studying online, and still give them adequate privacy. This paper deals with the attempt to detect the fraudulent use of credit cards in a timely manner, to avoid as much negative effects in the world of E-commerce and help maintain consumer confidence. Thus, in the current study, machine learning algorithm LightGBM has been used to detect fraudulent credit card transactions from a real-life dataset containing credit card transactions of the customers. The performance of this classifier is compared with two state-of-the-art classifiers-Decision Tree, and Random Forests, which are extensively used for solving such problems. Since there is data imbalance between fraudulent and nonfraudulent class, the data sampling technique used is the Synthetic Minority Oversampling Technique (SMOTE). SMOTE Oversampling performed best on all classifiers and LightGBM obtained precision value of 1 for both fraudulent and non-fraudulent class.

Detection of Credit Card Fraud with Machine Learning Methods and Resampling Techniques

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)

Financial institutions in the form of banks provide facilities in the form of credit cards, but with the development of technology, fraud on credit card transactions is still common, so a system is needed that can detect fraud transactions quickly and accurately. Therefore, this study aims to classify fraudulent transactions. The proposed method is Ensemble Learning which will be tested using the Boosting type with 3 variations, namely XGBoost, Gradient Boosting, and AdaBoost. Then, to maximize the performance of the model, the dataset used is optimized with the Synthetic Minority Oversampling Technique (SMOTE) function from the Imblearn library in the data train to handle imbalanced dataset conditions. The dataset used in this study is entitled "Credit Card Fraud Detection" with a total of 284807 data which is divided into two classes: Not Fraud and Fraud. The proposed model received a recall of 92% with Gradient Boosting, where the results increased by 10.37% compared to...

Credit Card Fraud Detection Using Linear Discriminant Analysis (Lda), Random Forest, and Binary Logistic Regression

BAREKENG: Jurnal Ilmu Matematika dan Terapan

The growth of electronic payment usage makes the monetary tension of credit-card deception is changing into major defiance for finance and technology companies. Therefore, pressuring them to continuously advance their fraud detection system is crucial. In this research, we describe fraud detection as a classification issue by comparing three methods. The method used is Linear Discriminant Analysis (LDA), Random Forest, and Binary Logistic Regression. The dataset used is a dataset containing transactions made by credit cards. The challenge in this analysis is that the dataset is highly unbalanced, so SMOTE must perform better on the data. The dataset contains only continuous features that are transformed into Principal Component Scores (PCs). The results show that the binary regression algorithm, the Random Forest algorithm, and the Linear Discriminant Analysis with variables that have SMOTE have AUC values greater than using the original variables. The largest AUC value was obtained...

Handling Class Imbalance in Credit Card Fraud Using Various Sampling Techniques

American Journal of Multidisciplinary Research and Innovation

Over the last few decades, credit card fraud (CCF) has been a severe problem for both cardholders and card providers. Credit card transactions are fast expanding as internet technology advances, significantly relying on the internet. With advanced technology and increased credit card usage, fraud rates are becoming a problem for the economy. However, the credit card dataset is highly imbalanced and skewed. Many classification techniques are used to classify fraud and non-fraud but in a certain condition, they may not generate the best results. Different types of sampling techniques such as under-over sampling, Synthetic Minority Oversampling, and Adaptive synthetic techniques have been used to overcome the class imbalance problem in the credit card dataset. Then, the sampled datasets are classified using different machine learning techniques like Decision Tree, Random Forest, K-Nearest Neighbors, Logistic Regression, and Naive Bayes. Recall, F1- score, accuracy, precision, and error...

Data Balancing for Credit Card Fraud Detection Using Complementary Neural Networks and SMOTE Algorithm

Springer eBooks, 2021

This Research presents an innovative approach towards detecting fraudulent credit card transactions. A commonly prevailing yet dominant problem faced in detection of fraudulent credit card transactions is the scarce occurrence of such fraudulent transactions with respect to legitimate (authorized) transactions. Therefore, any data that is recorded will always have a stark imbalance in the number of minority (fraudulent) and majority (legitimate) class samples. This imbalanced distribution of the training data among classes makes it hard for any learning algorithm to learn the features of the minority class. In this thesis work, we analyze the impact of applying class-balancing techniques on the training data namely oversampling (using SMOTE algorithm) for minority class and under sampling (using CMTNN) for majority class. The usage of most popular classification algorithms such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGB), Logistic Regression (LR), Random Forest (RF) are processed on balanced data and which results to quantify the performance improvement provided by our approach. The experiments show that the hybrid approach which integrates Complementary Neural Network and Synthetic Minority Oversampling Technique gives a Quantitative performance in terms of Accuracy of 99% and 99.7% of AUC with Random Forest Classification Algorithm compared to simple undersampling and oversampling.

An Experimental Study with Imbalanced Classification Approaches for Credit Card Fraud Detection

IEEE Access

Credit card fraud is a criminal offense. It causes severe damage to financial institutions and individuals. Therefore, the detection and prevention fraudulent activities are critically important to financial institutions. Fraud detection and prevention are costly, time-consuming and labor-intensive tasks. A number of significant research works have been dedicated to developing innovative solutions to detect different types of fraud. However, these solutions have been proved ineffective. According to Cifa, 33,305 cases of credit card identity fraud were reported between January and June in 2018. a Various weaknesses of existing solutions have been reported in the literature. Among them all, the imbalance classification is the most critical and a well-known problem. Imbalance classification consists in having a small number of observations of the minority class compared to the majority in the data set. In this problem, the ratio fraud : legitimate is very small, which makes it extremely difficult for the classification algorithm to detect fraud cases. In this paper, we will conduct a rigorous experimental study with the solutions that tackle the imbalance classification problem. We explored these solutions along with the machine learning algorithms used for fraud detection. We identified their weaknesses and summarized the results that we obtained using a credit card fraud labeled dataset. According to our study, imbalanced classification approaches are ineffective especially when data are highly imbalanced. Our study reveals that the existing approaches result in a large number of false alarms, which are costly to the financial institutions. This may lead to inaccurate detection as well as increasing the occurrence of fraud cases.