Survey on Email Classification Techniques/Algorithms (original) (raw)

A Comparative Study for Email Classification

Email has become one of the fastest and most economical forms of communication. However, the increase of email users have resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment was performed based on different data size and different feature size. The final classification result should be '1' if it is finally spam, otherwise, it should be '0'. This paper shows that simple J48 classifier which make a binary tree, could be efficient for the dataset which could be classified as binary tree.

An Automation Technique for Email Classification

International Journal of Advance Research and Innovative Ideas in Education, 2018

The email categorization has been proposed using Naive Bayes classification algorithm. .The categorization is based on not only the body but also the header of an email mes- sage. The metadata provide additional information that can be exploited and improve the categorization capability. Results of experiments on real email data demonstrate the feasibility of our approach.Results of system on real email data categorized into three types i.e. Primary, social and shopping. In particular, categorization based only on the header information is compara- ble or superior to that based on all the information in a message. The email communication becomes prevalent, all kinds of emails are generated.To classify emails for better visual representation and easy access to high priority important mails. The internal communications department of a company distributes an email message to all employees to remind the deadline of timecard submission.

Email Classification Research Trends: Review and Open Issues

IEEE Access, 2017

Personal and business users prefer to use e-mail as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile applications, and social networks. As the volume of business-critical e-mails continues to grow, the need to automate the management of e-mails increases for several reasons, such as spam e-mail classification, phishing e-mail classification, and multi-folder categorization, among others. This paper comprehensively reviews articles on e-mail classification published in 2006-2016 by exploiting the methodological decision analysis in five aspects, namely, e-mail classification application areas, data sets used in each application area, feature space utilized in each application area, e-mail classification techniques, and the use of performance measures. A total of 98 articles (56 articles from Web of Science core collection databases and 42 articles from Scopus database) are selected. To achieve the objective of the study, a comprehensive review and analysis is conducted to explore the various areas where e-mail classification was applied. Moreover, various public data sets, features sets, classification techniques, and performance measures are examined and used in each identified application area. This review identifies five application areas of e-mail classification. The most widely used data sets, features sets, classification techniques, and performance measures are found in the identified application areas. The extensive use of these popular data sets, features sets, classification techniques, and performance measures is discussed and justified. The research directions, research challenges, and open issues in the field of e-mail classification are also presented for future researchers.

A Review of Text Classification Approaches for E-mail Management

ijetch.org

Abstract—The continuing explosive growth of textual content within the World Wide Web has given rise to the need for sophisticated Text Classification (TC) techniques that combine efficiency with high quality of results. E-mail filtering and email organization is an ...

E-Mail Classification Mechanism using Classifier of Data Mining Technique

Personal and industrial users desire to utilize email as one of the crucial resource of communication. The volume of business-critical emails maintain to grow, the need to computerize the management of emails amplify for numerous reasons, such as spam email categorization, phishing email categorization, and multi-folder classification, in the middle of others. We propose a novel method named as " Semantic Match " which enlarges a learning replica with the carry of WordNet Tool. The proposed Technique is competent of behaviour the extracted Futures resemblance and computation description such as correctness and complication. Here we will be by means of WordNet dictionary folder which affords a semantic dictionary for English. WordNet is a great lexical database of English. And also here we use the classification method of SVM and KNN to classify the E-Mails in an appropriate manner.

Email Categorization using Hybrid Supervised and Unsupervised Approach

2014

As with the use of internet, use of emails increases drastically for electronic communication. This leads the mail boxes gets congested and emerged the problem of email overload, which is solved with the help of email categorization or email management. Email Categorization is multifaceted problem with many difficulties. Many schemes have been proposed for solving this problem in either supervised or unsupervised approach. With that approach once categorization model is built, it is hard to make any changes to them for handling of dynamic situations. As email replicates current information around the globe, the email content will be changed with the passage of time. Concept drift is the situation which occurs due to changes in underlying data distribution over a time period. The problem of concept drift detection and handling will occur due to dynamic nature of email. This paper proposes the dynamic hybrid scheme, combines supervised and unsupervised approach for detection and handling of concept drift. Initial classifier is built with the help of classification algorithm, and then clustering algorithm is applied in 'General' category of classifier to detect concept drift.. If it is detected then new cluster is formed for that new emerging concept and appropriate label is assigned to that cluster.

Comparison of four email classification algorithms using WEKA

IJCSIS Vol 17 No 2, 2019

Being the fast and economical means of communication has prompted many to use email as the main communication medium for both official and personal purposes. However, the rapid increase in the number of e-mail users has resulted in a dramatic increase in the number of spams in recent years. Spam mail has become an increasing menace as it increases the chances of virus threats, communication overload, wastage of time, irritation and disturbance, etc., to the users. Hence, there is a need for developing efficient spam filters. Several classification algorithms for mining text are being employed to classify mails as legitimate or otherwise. Comparison among these algorithms using some machine learning software are to be conducted in order to determine which algorithm is better efficient in classifying the e-mails. In this study, four algorithms classifying spam, namely, LAZY-IBK, Naïve Bayes BayesNet, and J48, were investigated for their classification accuracy in the WEKA environment. The in-depth analysis of the previous studies and descriptions of the four classification algorithms is presented. An experimental analysis, which compares the four classification algorithms on the basis of parameters, such as 'accuracy', 'precision', 'recall', 'F-measure' and 'false positive rate', to measure the performance of these four classification algorithms was performed and the result was analyzed. This result reveals that J48 gave the most accurate results among the four algorithms.

IJERT-Email Categorization using Hybrid Supervised and Unsupervised Approach

International Journal of Engineering Research and Technology (IJERT), 2014

https://www.ijert.org/email-categorization-using-hybrid-supervised-and-unsupervised-approach https://www.ijert.org/research/email-categorization-using-hybrid-supervised-and-unsupervised-approach-IJERTV3IS060795.pdf As with the use of internet, use of emails increases drastically for electronic communication. This leads the mail boxes gets congested and emerged the problem of email overload, which is solved with the help of email categorization or email management. Email Categorization is multifaceted problem with many difficulties. Many schemes have been proposed for solving this problem in either supervised or unsupervised approach. With that approach once categorization model is built, it is hard to make any changes to them for handling of dynamic situations. As email replicates current information around the globe, the email content will be changed with the passage of time. Concept drift is the situation which occurs due to changes in underlying data distribution over a time period. The problem of concept drift detection and handling will occur due to dynamic nature of email. This paper proposes the dynamic hybrid scheme, combines supervised and unsupervised approach for detection and handling of concept drift. Initial classifier is built with the help of classification algorithm, and then clustering algorithm is applied in 'General' category of classifier to detect concept drift.. If it is detected then new cluster is formed for that new emerging concept and appropriate label is assigned to that cluster.

Email Categorization Using Multi-stage Classification Technique

Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2007), 2007

This paper presents an innovative email categorization using a serialized multi-stage classification ensembles technique. Many approaches are used in practice for email categorization to control the menace of spam emails in different ways. Contentbased email categorization employs filtering techniques using classification algorithms to learn to predict spam e-mails given a corpus of training e-mails. This process achieves a substantial performance with some amount of FP tradeoffs. It has been studied and investigated with different classification algorithms and found that the outputs of the classifiers vary from one classifier to another with same email corpora. In this paper we have proposed a multi-stage classification technique using different popular learning algorithms with an analyser which reduces the FP (false positive) problems substantially and increases classification accuracy compared to similar existing techniques.

Email classification analysis using machine learning techniques

Applied computing and informatics, 2022

Purpose-In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed. Design/methodology/approach-Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used. Findings-The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers. Originality/value-In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.