Benjamin Borketey - Academia.edu (original) (raw)
Uploads
Papers by Benjamin Borketey
International Journal of Automation, Artificial Intelligence and Machine Learning , 2024
What traveler features should be considered when designing airline travel insurance policies, and... more What traveler features should be considered when designing airline travel insurance policies,
and can predictive modeling enhance the accuracy of purchase predictions? Motivated by the
increased need to safeguard investments due to frequent flight interruptions and cancellations
during the COVID-19 pandemic and its travel restrictions, we investigate the uptake of flight
travel insurance using predictive models. This study applies various machine learning
techniques to a dataset consisting of 1,987 travelers, examining whether they purchased travel
insurance (a binary classification problem). Performance metrics such as misclassification rate,
precision, recall, F-score, and the area under the receiver operating characteristic curve (AUC)
are used to assess model effectiveness. The models were optimized using cross-validation on
the training data. Among the models tested, eXtreme Gradient Boosting Machine (XGBoost)
achieved the highest accuracy rate of 86%, along with the best AUC, precision, recall, and
specificity, indicating a 98% accuracy in predicting who will purchase travel insurance. Other
robust models, such as ensemble methods and neural networks, also demonstrated strong
performance, with similar AUC and precision scores. Features such as annual income, age,
travel history, and education history were found to be the most significant predictors, while
chronic disease history had little impact. Parsimonious predictive models, using only the most
important variables, yielded better performance. Our findings highlight the critical role of
predictive accuracy in helping insurers mitigate the financial risk due to travel interruptions.
Journal of data analysis and information processing, 2024
Credit card fraud remains a significant challenge, with financial losses and consumer protection ... more Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as "V12" and "V14". SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.
Journal of data analysis and information processing, 2024
Researchers have extensively explored the impact of wages on individuals' decisions to engage in ... more Researchers have extensively explored the impact of wages on individuals' decisions to engage in property crimes. While most of these studies in the past have relied on macro-level data to investigate the relationship between crime rates and hourly wages, this paper takes a novel approach by utilizing micro-level data to examine the influence of hourly wages on the likelihood of stealing an item valued at least $50. The results obtained from the estimations reveal that an increase in hourly wage leads to a decrease in the probability of theft, all other factors being held constant. Further estimation by gender revealed that hourly wages given to both male and female have no bearing on the decision to steal. Additionally, the analysis of the differences in theft probabilities across gender and race demonstrates that males consistently exhibit a higher likelihood of engaging in theft when compared to females across various racial groups.
Real Time Fraud Detection Using Machine Learning , 2024
Credit card fraud remains a significant challenge, with financial losses and consumer protection ... more Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as "V12" and "V14". SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.
International Journal of Automation, Artificial Intelligence and Machine Learning , 2024
What traveler features should be considered when designing airline travel insurance policies, and... more What traveler features should be considered when designing airline travel insurance policies,
and can predictive modeling enhance the accuracy of purchase predictions? Motivated by the
increased need to safeguard investments due to frequent flight interruptions and cancellations
during the COVID-19 pandemic and its travel restrictions, we investigate the uptake of flight
travel insurance using predictive models. This study applies various machine learning
techniques to a dataset consisting of 1,987 travelers, examining whether they purchased travel
insurance (a binary classification problem). Performance metrics such as misclassification rate,
precision, recall, F-score, and the area under the receiver operating characteristic curve (AUC)
are used to assess model effectiveness. The models were optimized using cross-validation on
the training data. Among the models tested, eXtreme Gradient Boosting Machine (XGBoost)
achieved the highest accuracy rate of 86%, along with the best AUC, precision, recall, and
specificity, indicating a 98% accuracy in predicting who will purchase travel insurance. Other
robust models, such as ensemble methods and neural networks, also demonstrated strong
performance, with similar AUC and precision scores. Features such as annual income, age,
travel history, and education history were found to be the most significant predictors, while
chronic disease history had little impact. Parsimonious predictive models, using only the most
important variables, yielded better performance. Our findings highlight the critical role of
predictive accuracy in helping insurers mitigate the financial risk due to travel interruptions.
Journal of data analysis and information processing, 2024
Credit card fraud remains a significant challenge, with financial losses and consumer protection ... more Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as "V12" and "V14". SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.
Journal of data analysis and information processing, 2024
Researchers have extensively explored the impact of wages on individuals' decisions to engage in ... more Researchers have extensively explored the impact of wages on individuals' decisions to engage in property crimes. While most of these studies in the past have relied on macro-level data to investigate the relationship between crime rates and hourly wages, this paper takes a novel approach by utilizing micro-level data to examine the influence of hourly wages on the likelihood of stealing an item valued at least $50. The results obtained from the estimations reveal that an increase in hourly wage leads to a decrease in the probability of theft, all other factors being held constant. Further estimation by gender revealed that hourly wages given to both male and female have no bearing on the decision to steal. Additionally, the analysis of the differences in theft probabilities across gender and race demonstrates that males consistently exhibit a higher likelihood of engaging in theft when compared to females across various racial groups.
Real Time Fraud Detection Using Machine Learning , 2024
Credit card fraud remains a significant challenge, with financial losses and consumer protection ... more Credit card fraud remains a significant challenge, with financial losses and consumer protection at stake. This study addresses the need for practical, real-time fraud detection methodologies. Using a Kaggle credit card dataset, I tackle class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) to enhance modeling efficiency. I compare several machine learning algorithms, including Logistic Regression, Linear Discriminant Analysis, K-nearest Neighbors, Classification and Regression Tree, Naive Bayes, Support Vector, Random Forest, XGBoost, and Light Gradient-Boosting Machine to classify transactions as fraud or genuine. Rigorous evaluation metrics, such as AUC, PRAUC, F1, KS, Recall, and Precision, identify the Random Forest as the best performer in detecting fraudulent activities. The Random Forest model successfully identifies approximately 92% of transactions scoring 90 and above as fraudulent, equating to a detection rate of over 70% for all fraudulent transactions in the test dataset. Moreover, the model captures more than half of the fraud in each bin of the test dataset. SHAP values provide model explainability, with the SHAP summary plot highlighting the global importance of individual features, such as "V12" and "V14". SHAP force plots offer local interpretability, revealing the impact of specific features on individual predictions. This study demonstrates the potential of machine learning, particularly the Random Forest model, for real-time credit card fraud detection, offering a promising approach to mitigate financial losses and protect consumers.