Detecting attacks on e-mail (original) (raw)
Related papers
Classification for Spam Filtering using Naive Bayes
2015
An efficient anti-spam filter that would block all spam, without blocking any legitimate messages is a growing need. To address this problem, we examine the effectiveness of statistically-based approaches Naïve Bayesian anti-spam filters, as it is content-based and selflearning (adaptive) in nature. Additionally, we designed a derivative filter based on relative numbers of tokens. We train the filters using a large corpus of legitimate messages and spam and we test the filter using new incoming personal messages. More specifically, four filtering techniques available for a Naïve Bayesian filter are evaluated. We look at the effectiveness of the technique, and we evaluate different threshold values in order to find an optimal antispam filter configuration. Based on cost-sensitive measures, we conclude that additional safety precautions are needed for a Bayesian anti-spam filter to be put into practice. However, our technique can make a positive contribution as a first pass filter.
An innovative spam filtering model based on support vector machine
… Intelligence for Modelling, …, 2005
Spam is commonly defined as unsolicited email messages and the goal of spam categorization is to distinguish between spam and legitimate email messages. Many researchers have been trying to separate spam from legitimate emails using machine learning algorithms based ...
An Evaluation of Naive Bayesian Anti-Spam Filtering
It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail ("spam"). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter's performance, issues that had not been previously explored. After introducing appropriate cost-sensitive evaluation measures, we reach the conclusion that additional safety nets are needed for the Naive Bayesian anti-spam filter to be viable in practice.
Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '00, 2000
The growing problem of unsolicited bulk e-mail, also known as "spare", has generated a need for reliable anti-spam e-mail filters. Filters of this type have so far been based mostly on manually constructed keyword patterns. An alternative approach has recently been proposed, whereby a Naive Bayesian classifier is trained automatically to detect spam messages. We test this approach on a large collection of personal e-mail messages, which we make publicly available in "encrypted" form contributing towards standard benchmarks. We introduce appropriate cost-sensitive measures, investigating at the same time the effect of attributeset size, training-corpus size, lemmatization, and stop lists, issues that have not been explored in previous experiments. Finally, the Naive Bayesian filter is compared, in terms of performance, to a filter that uses keyword patterns, and which is part of a widely used e-mail reader.
Developing a Spam Email Detector
2015
16 Abstract— Email is obviously important for many types of group communication that have become most widely used by millions of people, individuals and organizations. At the same time it has become a prone to threats. The most popular such threats is what is called a spam, also known as unsolicited bulk email or junk email. To detect spams, this work proposes a spam detection approach using Naive Bayesian (NB) classifier, where this classifier identifies email messages as being spam or legitimate, based on the content (i.e. body) of these messages. Each email is represented as a bag of its body’s words (features). To catch up with the spammers latest techniques, a robust, yet up-to-dated dataset CSDMC2010 spam corpus (last updated 2014) a set of raw email messages, was considered. To best perform, NB’s environment was integrated with a list of 149 features proposed to include those commonly used by most spam emails. CSDMC2010 dataset was used to train and test NB classifier. Certai...
Machine Learning Based Classification for Spam Detection
Sakarya Üniversitesi Fen Bilimleri Enstitüsü dergisi/Sakarya Üniversitesi fen bilimleri enstitüsü dergisi, 2023
Electronic Electronic messages, i.e. e-mails, are a communication tool frequently used by individuals or organizations. While e-mail is extremely practical to use, it is necessary to consider its vulnerabilities. Spam e-mails are unsolicited messages created to promote a product or service, often sent frequently. It is very important to classify incoming e-mails in order to protect against malware that can be transmitted via e-mail and to reduce possible unwanted consequences. Spam email classification is the process of identifying and distinguishing spam emails from legitimate emails. This classification can be done through various methods such as keyword filtering, machine learning algorithms and image recognition. The goal of spam email classification is to prevent unwanted and potentially harmful emails from reaching the user's inbox. In this study, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM) and Artificial Neural Network (ANN) algorithms are used to classify spam emails and the results are compared. Algorithms with different approaches were used to determine the best solution for the problem. 5558 spam and non-spam e-mails were analyzed and the performance of the algorithms was reported in terms of accuracy, precision, sensitivity and F1-Score metrics. The most successful result was obtained with the RF algorithm with an accuracy of 98.83%. In this study, high success was achieved by classifying spam emails with machine learning algorithms. In addition, it has been proved by experimental studies that better results are obtained than similar studies in the literature. 1. Introduction With the widespread use of the Internet, electronic communication has become more preferred. One of the most important tools of electronic communication is electronic messages, which we call e-mail. Today, individuals or organizations have one or more email accounts. Instant delivery of messages, no cost and ease of use increase the importance and prevalence of e-mail [1]. According to Statista Research Department data, the number of actively used e-mail accounts in 2020 is more than 4 billion. This number is estimated to increase to 4.6 billion in 2025. In 2020, 306 billion e-mails are sent and received every day, and this number is expected to exceed 376 billion in 2025 [2]. The use of e-mail is not only practical but also has various vulnerabilities. The e-mail account to be hijacked in various ways, for e-mails containing advertisements etc. to hijack your computer by installing a software on your computer when you click on the advertisement, and for the installed software to disrupt communication by sometimes filling the
Efficient Support Vector Machines for Spam Detection: A Survey
Nowadays, the increase volume of spam has been annoying for the internet users. Spam is commonly defined as unsolicited email messages, and the goal of spam detection is to distinguish between spam and legitimate email messages. Most of the spam can contain viruses, Trojan horses or other harmful software that may lead to failures in computers and networks, consumes network bandwidth and storage space and slows down email servers. In addition it provides a medium for distributing harmful code and/or offensive content and there is not any complete solution for this problem, then the necessity of effective spam filters increase. In the recent years, the usability of machine learning techniques for automatic filtering of spam can be seen. Support Vector Machines (SVM) is a powerful, state-of-the-art algorithm in machine learning that is a good option to classify spam from email. In this article, we consider the evaluation criterions of SVM for spam detection and filtering. https://sites.google.com/site/ijcsis/
E-Mail Spam Detection Using Supportive Vector Machine
IRJET, 2022
With the rapid pace of growth of the internet it has been the most tedious job to classify the messages into ham and spam. Spam is defined as an unwanted message sent to somebody via email and messages. Many messages or emails often contain viruses, phishing material in order to break the privacy of the person by looking into its various confidential information. Many viruses try to get into the computer by this process of spam messages, so we have tried to build our project using the SVM process so that we could increase the spam detection feature of the model using neural networks in order to reduce the error. The features involved include regression, naive bayes and several other regression models.
A Review Article On Enhancing Email Spam Filter's Accuracy Using Machine Learning
International Journal of Innovative Research in Computer Science and Technology (IJIRCST), 2023
In today's era, almost everyone is using emails on their daily basis. In our proposed research, we suggest a machine learning-based strategy for enhancing email spam filters' accuracy. Traditional rule-based filters have grown less effective as spam emails have multiplied exponentially. Models can be trained to identify emails as spam or not using machine learning algorithms, particularly supervised learning. We need to create a simple and straightforward machine learning model in order to reach more accurate results while categorizing email spam. We picked the Naive Bayes technique for our model since it is quicker and more accurate than other algorithms. The suggested method can have incorporated into current email systems to enhance spam filtering functionality. This review paper provides an overview of the machine learning model we have suggested.
IJRASET, 2021
E-mail is that the most typical method of communication because of its ability to get, the rapid modification of messages and low cost of distribution. E-mail is one among the foremost secure medium for online communication and transferring data or messages through the net. An overgrowing increase in popularity, the quantity of unsolicited data has also increased rapidly. Spam causes traffic issues and bottlenecks that limit the quantity of memory and bandwidth, power and computing speed. To filtering data, different approaches exist which automatically detect and take away these untenable messages. There are several numbers of email spam filtering technique like Knowledge-based technique, Clustering techniques, Learning-based technique, Heuristic processes so on. For data filtering, various approaches exist that automatically detect and suppress these indefensible messages. This paper illustrates a survey of various existing email spam filtering system regarding Machine Learning Technique (MLT) like Naive Bayes, SVM, K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and rules. Henceforth here we give the classification, evaluation and comparison of some email spam filtering system and summarize the scenario regarding accuracy rate of various existing approaches.