Comparative Study of Machine Learning Algorithms for Performing Ham or Spam Classification in SMS (original) (raw)

Purpose: Fraud is rampant in the current era, especially in the era of technology where there is now easy access to a lot of information. Therefore, everyone needs to be able to sort out whether the information received is the right information or information that is fraudulent. In this research, the process of classifying messages including ham or spam has been carried out. The purpose of this research is to be able to build a model that can help classify messages. The purpose of this research is also to determine which machine learning method can accurately and efficiently perform the ham or spam classification process on messages. Methods: In this research, the ham or spam classification process has been using machine learning methods. The machine learning methods used are the classification process with Random Forest, Logistic Regression, Support Vector Classification, Gradient Boosting, and XGBoost Classifier algorithms. Results: The results obtained after testing in this study are the classification process using the Random Forest algorithm getting an accuracy of 97.28%, Logistic Regression getting an accuracy of 94.67%, with Support Vector Classification getting an accuracy of 97.93%, and using XGBoost Classifier getting an accuracy of 96.47%. The best precision value obtained in this study is 98% when using the random forest algorithm. The best recall value is 94% when using the SVC algorithm. While the best f1-score value is 95% when using the SVC algorithm. Novelty: This research has been compared with several algorithms. In previous research, it is still very rarely done using XGBoost to classify the ham or spam in messages. We focus on giving brief information based con comparison algorithm and show the best algorithm to classify classify the ham or spam in messages. And for the novelty that exists from this research, the machine learning model built gets better accuracy when compared to previous research.