Survey of machine learning methods for spam e-mail classification (original) (raw)

MACHINE LEARNING METHODS FOR SPAM E-MAIL CLASSIFICATION

The increasing volume of unsolicited bulk e-mail (also known as spam) has generated a need for reliable anti-spam filters. Machine learning techniques now days used to automatically filter the spam e-mail in a very successful rate. In this paper we review some of the most popular machine learning methods (Bayesian classification, k-NN, ANNs, SVMs, Artificial immune system and Rough sets) and of their applicability to the problem of spam Email classification. Descriptions of the algorithms are presented, and the comparison of their performance on the SpamAssassin spam corpus is presented.

E-mail Spam Classification using KNN and Naive Bayes

Highlights in Science, Engineering and Technology

E-mail spam filtering is becoming a critical and concerned issue in network security recently, and multiple machine learning techniques have been applied to tackle such sort of classification problem. With the emerging of machine learning framework, most of the tasks has been changed via the effective machine learning algorithms with satisfying performance and high speed. However, the underlying performances of different algorithms under certain given circumstances still lack of an intuitive demonstration. Hence, this study mainly focuses on the performance of two widely-used algorithms (KNN and Naive Bayes) from metrics including accuracy and running time, comparing the unique advantage of each algorithm when classifying emails. The paper uses thousands of spam data to feed two algorithms and analyzes both results respectively, indicating that KNN classifier performs better when determining the spam messages while the opposite is true for the Naive Bayes classifier. Thus, designers...

Comparison of Algorithms on Machine Learning For Spam Email Classification

IJISTECH (International Journal of Information System and Technology), 2021

The rapid development of email use and the convenience provided make email as the most frequently used means of communication. Along with its development, many parties are abusing the use of email as a means of advertising promotion, phishing and sending other unimportant emails. This information is called spam email. One of the efforts in overcoming the problem of spam emails is by filtering techniques based on the content of the email. In the first study related to the classification of spam emails, the Naïve Bayes method is the most commonly used method. Therefore, in this study researchers will add Random Forest and K-Nearest Neighbor (KNN) methods to make comparisons in order to find which methods have better accuracy in classifying spam emails. Based on the results of the trial, the application of Naïve bayes classification algorithm in the classification of spam emails resulted in accuracy of 83.5%, Random Forest 83.5% and KNN 82.75%

A Comprehensive Review on Email Spam Classification with Machine Learning Methods

International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 2023

This comprehensive review delves into the realm of email spam classification, scrutinizing the efficacy of various machine learning methods employed in the ongoing battle against unwanted email communication. The paper synthesizes a wide array of research findings, methodologies, and performance metrics to provide a holistic perspective on the evolving landscape of spam detection. Emphasizing the pivotal role of machine learning in addressing the dynamic nature of spam, the review explores the strengths and limitations of popular algorithms such as Naive Bayes, Support Vector Machines, and neural networks. Additionally, it examines feature engineering, dataset characteristics, and evolving threats, offering insights into the challenges and opportunities within the field. With a focus on recent advancements and emerging trends, this review aims to guide researchers, practitioners, and developers in the ongoing pursuit of robust and adaptive email spam classification systems.

An Empirical Performance Comparison of Machine Learning Methods for Spam E-Mail Categorization

2004

The increasing volume of unsolicited bulk e-mail (also known as spam) has generated a need for reliable anti-spam filters. Using a classifier based on machine learning techniques to automatically filter out spam e-mail has drawn many researchers' attention. In this paper, we review some of relevant ideas and do a set of systematic experiments on e-mail categorization, which has been conducted with four machine learning algorithms applied to different parts of e-mail. Experimental results reveal that the header of e-mail provides very useful information for all the machine learning algorithms considered to detect spam e-mail.

Efficient Spam Email Classification using Machine Learning Algorithms

In today's digital age, since email is the main form of communication, the identification of email spam is a critical issue. In addition to consuming a lot of time and money, email spam is also a security and privacy risk. In this paper, we provide a means for email spam detection that employes machine learning Algorithms. The required features for training the ML models have been engineered after analysis of the email dataset of contentbased filtering obtained from Kaggle website. We tested a Several types of algorithms for machine learning and analyzed their level of performance using the dataset. Our findings demonstrate how effective is the suggested approach in identifying email spam with highest accuracy of 99.8% and Rmse of 0.2 .Here we applied , the various ML classifier algorithm such as Decision tree , Voting Classifier , Random Forest, Logistic Regression and so on to our dataset ,compared among each other and found which suits best for the dataset with the highest accuracy. This method can be useful in email clients or servers to detect spam emails automatically and enhanced

A Comparative Analysis of Machine Learning Techniques for Spam Detection

World Academy of Research in Science and Engineering, 2019

Data Science is an emerging multidisciplinary field which employs algorithms, processes, scientific methods to extract information and insights in various forms which is both structures and unstructured much similar to data mining and prediction analysis. Advertisement and bulk emails, also called as spam, makes an estimate of 62% of the Worldwide internet traffic. Since 1978, when first unwanted mail was sent, technology have advanced but still the detection of spams remains a chronophagous and big budget problem in the field of mathematical sciences. The current study evaluates the effectiveness and efficiency of various machine learning techniques which include K-NN, Decision tree, random forest, Naive Bayes and SVM for spam detection. A data set comprising of 962 emails containing both genuine emails and spams has been used in this study. Some deep learning techniques for classification of spams is also suggested for better performance.

Comparison of Three Machine Learning Models for the Detection of Emails Spam

Recently, machine learning has been applied into different major areas such as text classification, machine translation, and spam detection. The great performance of machine learning algorithms into several fields provided the humans with opportunities to tackle some of their hard jobs to be handled by machine learning systems. These tasks seem effortless for machines, and need less time as the amount of texts or spams need to be classified is huge. Hence, in his paper, we propose three different models for the task of emails spam detection. The three models are trained and validated on a public spam dataset. Experimentally, the models performed differently and it was seen that the Naïve Bayes outperformed the other machine learning algorithms in terms of accuracy and other evaluation metrics.

A Review Article On Enhancing Email Spam Filter's Accuracy Using Machine Learning

International Journal of Innovative Research in Computer Science and Technology (IJIRCST), 2023

In today's era, almost everyone is using emails on their daily basis. In our proposed research, we suggest a machine learning-based strategy for enhancing email spam filters' accuracy. Traditional rule-based filters have grown less effective as spam emails have multiplied exponentially. Models can be trained to identify emails as spam or not using machine learning algorithms, particularly supervised learning. We need to create a simple and straightforward machine learning model in order to reach more accurate results while categorizing email spam. We picked the Naive Bayes technique for our model since it is quicker and more accurate than other algorithms. The suggested method can have incorporated into current email systems to enhance spam filtering functionality. This review paper provides an overview of the machine learning model we have suggested.