Comparative Study of Machine Learning Algorithms for Performing Ham or Spam Classification in SMS (original) (raw)

Spam or Ham Prediction

Nowadays, we use frequently e-mails, one of the communication channels, in electronic environment. It play an important role in our lives because of many reasons such as personal communications, businessfocused activities, marketing, advertising etc. E-mails make life easier because of meeting many different types of communication needs. On the other hand they can make life difficult when they are used outside of their purposes. Spam emails can be not only annoying receivers, but also dangerous for receiver's information security. Detecting and preventing spam e-mails has been a separate issue. The analysis of dataset by supervised machine learning technique (SMLT) to capture several information's like, variable identification, uni-variate analysis, bi-variate and multi-variate analysis, missing value treatments and analyze the data validation, data cleaning/preparing and data visualization will be done on the entire given dataset. To propose a machine learning-based method to classify the email in the form of spam or ham by best accuracy from comparing supervised classification machine learning algorithms.

Appropriate Detection of HAM and Spam Emails Using Machine Learning Algorithm

An clever and automatic anti-unsolicited mail framework is vital because of the excessive increase of unsolicited e-mail assaults and the inherent malevolent dynamic inside the ones assaults on numerous social, personal, and expert work. There is an increased risk of identity theft, theft of sensitive information, financial loss, damage to reputation, and other crimes that threaten the privacy of the victim. When taking into account the multidimensional feature set of email, current methods are rather fallible. We believe that an artificial intelligence-based strategy is the most effective one going forward, particularly unsupervised machine learning. Exploring the application of unsupervised learning for ham and spam clustering in the mail by comparing these Random Forest, Logistic, Random Tree, Bayes Net, and Naïve Bayes algorithms with LTSM Algorithms by using frequency weightage of words and validating the best accuracy is the purpose of this study.

A Novel Approach to Detect Spam and Smishing SMS using Machine Learning Techniques

International Journal of E-Services and Mobile Applications, 2020

Smishing attack is generally performed by sending a fake short message service (SMS) that contains a link of the malicious webpage or application. Smishing messages are the subclass of spam SMS and these are more harmful compared to spam messages. There are various solutions available to detect the spam messages. However, no existing solution, filters the smishing message from the spam message. Therefore, this article presents a novel method to filter smishing message from spam message. The proposed approach is divided into two phases. The first phase filters the spam messages and ham messages. The second phase filters smishing messages from spam messages. The performance of the proposed method is evaluated on various machine learning classifiers using the dataset of ham and spam messages. The simulation results indicate that the proposed approach can detect spam messages with the accuracy of 94.9% and it can filter smishing messages with the accuracy of 96% on neural network classi...

A Comparative Analysis of SMS Spam Detection Employing Machine Learning Methods

IEEE, 2022

In recent times, the increment of mobile phone usage has resulted in a huge number of spam messages. Spammers continuously apply more and more new tricks that cause managing or preventing spam messages a challenging task. The aim of this study is to detect spam message to prevent different cybercrimes as spam messages have become a security threat nowadays. In this paper, we contributed to previous studies on SMS spam problems to perform a better accuracy using several different techniques such as Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, Random Forest, Logistic Regression and some more. Our result indicated that Support Vector Machine achieved the highest accuracy of 99%, indicating it might be useful as an effective machine learning system for future research.

Review on Efficient Spam Detection Technique using Machine Learning

2022

People's communication methods are being transformed by electronic mail because of its affordability, speed, and simplicity. Due to their widespread exposure, spam emails have become a serious roadblock in electronic communication. The amount of time users sifting through incoming mail and eliminating spam necessitates the implementation of spam detection software. The main objective is to create suitable filters that can correctly recognise these emails and deliver outstanding performance in the majority of cases. This project makes use of Spam Detection to tell spam from valid email. SVM, a machine learning method, is employed in this case to assess it. SVMs and other approaches of machine learning (AI) Spam detection can benefit greatly from machine (SVM) detection. This project's classification is based on its features. In the email world, spam is a term that refers to unsolicited commercial communications or emails that deceive the recipient. With the use of artificial intelligence and machine learning, spam messages can be identified. Spam filtering is a popular application of machine learning techniques. Machine learning classifiers are used to identify emails as either ham (legitimate messages) or spam (unwanted messages) using these techniques.

An Improved Machine Learning-Based Short Message Service Spam Detection System

International Journal of Computer Network and Information Security, 2019

The use of Short Message Services (SMS) as a mechanism of communication has resulted to loss of sensitive information such as credit card details, medical information and bank account details (user name and password). Several Machine learning-based approaches have been proposed to address this problem, but they are still unable to detect modified SMS spam messages more accurately. Thus, in this research, a stack-ensemble of four machine learning algorithms consisting of Random Forest (RF), Logistic Regression (LR), Multilayer Perceptron (MLP), and Support Vector Machine (SVM), were employed to detect more accurately SMS spams. The simulation was carried out using Python Scikit-learn tools. The performance evaluation of the proposed model was carried out by benchmarking it with an existing model. The evaluation results showed that the proposed model has an increase of 3.03% of accuracy, 8.94% of Recall, 2.17% of F-measure; and a decrease of 4.55% of Precision over the existing model. This indicates that the proposed model reduces the false alarm rate and thus detects spams more accurately. In conclusion, the ensemble method performed better than any individual algorithms and can be adopted by the Network service providers for better Quality of Service.

Machine Learning Based Classification for Spam Detection

Sakarya Üniversitesi Fen Bilimleri Enstitüsü dergisi/Sakarya Üniversitesi fen bilimleri enstitüsü dergisi, 2023

Electronic Electronic messages, i.e. e-mails, are a communication tool frequently used by individuals or organizations. While e-mail is extremely practical to use, it is necessary to consider its vulnerabilities. Spam e-mails are unsolicited messages created to promote a product or service, often sent frequently. It is very important to classify incoming e-mails in order to protect against malware that can be transmitted via e-mail and to reduce possible unwanted consequences. Spam email classification is the process of identifying and distinguishing spam emails from legitimate emails. This classification can be done through various methods such as keyword filtering, machine learning algorithms and image recognition. The goal of spam email classification is to prevent unwanted and potentially harmful emails from reaching the user's inbox. In this study, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM) and Artificial Neural Network (ANN) algorithms are used to classify spam emails and the results are compared. Algorithms with different approaches were used to determine the best solution for the problem. 5558 spam and non-spam e-mails were analyzed and the performance of the algorithms was reported in terms of accuracy, precision, sensitivity and F1-Score metrics. The most successful result was obtained with the RF algorithm with an accuracy of 98.83%. In this study, high success was achieved by classifying spam emails with machine learning algorithms. In addition, it has been proved by experimental studies that better results are obtained than similar studies in the literature. 1. Introduction With the widespread use of the Internet, electronic communication has become more preferred. One of the most important tools of electronic communication is electronic messages, which we call e-mail. Today, individuals or organizations have one or more email accounts. Instant delivery of messages, no cost and ease of use increase the importance and prevalence of e-mail [1]. According to Statista Research Department data, the number of actively used e-mail accounts in 2020 is more than 4 billion. This number is estimated to increase to 4.6 billion in 2025. In 2020, 306 billion e-mails are sent and received every day, and this number is expected to exceed 376 billion in 2025 [2]. The use of e-mail is not only practical but also has various vulnerabilities. The e-mail account to be hijacked in various ways, for e-mails containing advertisements etc. to hijack your computer by installing a software on your computer when you click on the advertisement, and for the installed software to disrupt communication by sometimes filling the

SMS Spam Detection using Supervised Learning

Turkish Journal of Computer and Mathematics Education (TURCOMAT), 2021

Over the last decade, the growth of short message services has been rising. These text messages are more powerful for corporations than even SMS. This is because about 80 percent of sms remain unopened while 98 percent of smartphone users read theirs by the end of the day. Spam, which refers to any irrelevant text messages sent via mobile networks, has also gained popularity. For consumers, they are seriously irritating. Due to the geographical material, use of abbreviated words, the current Spam Detection techniques are more challenging than e-mail spam detection techniques , unfortunately very few of the existing research addresses these challenges. Much of the current research that has attempted to filter Spam has focused on features that were manually found. This paper aims to solve these concerns. Filtering is one of the most effective strategies among the methods developed to stop spam. Days of machine learning techniques are now used to process the spam SMS automatically at a very good rate. The goal of this research is to differentiate between ham and spam messages by developing an accurate and responsive model of classification that provides good accuracy with a low false positive rate

A Review on Various Approaches on Spam Detection of Mobile Phone SMS

A Review on Various Approaches on Spam Detection of Mobile Phone SMS, 2023

Spam and ham SMS detection in mobile phones, this paper presents review of spam Due to the availability of inexpensive bulk SMS bundles and the fact that messages elicit greater response rates due to the one-on-one and personalized nature of the service, they are a modern problem. In this study, to differentiate the messages, we will that will be classified into two categories spam and ham. Dataset of messages that contain whether or not the records are authentic messages is indicated by the text of SMS messages at the side of the label. Spam is defined as a dataset that includes SMS message text content and a label designating it as junk mail. In SMS spam messages, marketers use SMS text messages to send unwanted advertisements to specific clients. To get around this, we use the SMS spam dataset to compare the machine learning methods used to detect spam and non-spam messages and to determine the accuracy threshold.

Analysis of Spam Messages Using Various Machine Learning Classifier

Background: As people using social media increases the data generation also increases and the data generated may be safe or unsafe. If we see some applications like Twitter and mail. We get a lot of emails or twits that include all dangerous and useful things. Here to be safe from the threats and dangers we need a filter that separates useful messages from spam and helps us not to drown in a trap. And one of the approaches to do this is explained in this paper. In this paper, the algorithm followed is the Naïve Bayes classifier. This also provides the comparison between using Naïve Bayes, KNN, and Logistic Regression to solve the same problem that is spam filtering and term frequency-inverse document frequency (TFIDF).