Email Prioritization Using Machine Learning (original) (raw)
Related papers
Analysis of Machine Learning Algorithms for Email Classification Using NLP
In the exponentially growing world, people are using email across all areas of industries including Educational field. Therefore, it is very much important to differentiate between legitimate and spam email. In this paper, we have preprocessed emails using natural language processing and applied several machine-learning algorithms to analyze their performance on email classification. The performance observed here is accuracy and F1 score. The result shows that ANN outperforms the other algorithms. The ANN best accuracy is 98.80% and F1 score is 0.977778. Keywords—Natural Language processing; Machine learing; spam classification; emails
Information Sciences, 2007
In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naïve Bayes. We introduce a new accurate feature selector with linear time complexity. Secondly, we examine the applicability of the semi-supervised co-training paradigm for spam e-mail filtering by employing random forests, support vector machines, decision tree and naïve Bayes as base classifiers. The study shows that a classifier trained on a small set of labelled examples can be successfully boosted using unlabelled examples to accuracy rate of only 5% lower than a classifier trained on all labelled examples. We investigate the performance of co-training with one natural feature split and show that in the domain of spam e-mail filtering it can be as competitive as co-training with two natural feature splits.
Content Based E-Mail Classification
International Journal of Scientific Research in Science, Engineering and Technology, 2021
Electronic Mail (E-mail) has established a significant place in information user’s life. Mails are used as a major and important mode of information sharing because emails are faster and effective way of communication. Email plays its important role of communication in both personal and professional aspects of one’s life. The rapid increase in the number of account holders from last few decades and the increase in the volume of mails have generated various serious issues too. The content base mail classification can be classified into four ways namely Private, Public, Newsletter, and Anonymous. Every user has the right to choose their keyword (a semi-private password). Those contacts who know the user’s keyword will be classified as private contacts and those users who are unknown them classified anonymous contacts. A contact can be classified as public or private, upon verification of an anonymous contact. Any newsletter or group mails are classified into newsletter contacts. It is highly likely that the rests are junk mail or spam. In this project, a spam detector to identify an email as either spam or ham is built using n-gram analysis. The system involves the classification of mails based on user’s contacts. This way any mail from a contact whom the user knows very well is being displayed.
A Comparative Study for Email Classification
Email has become one of the fastest and most economical forms of communication. However, the increase of email users have resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment was performed based on different data size and different feature size. The final classification result should be '1' if it is finally spam, otherwise, it should be '0'. This paper shows that simple J48 classifier which make a binary tree, could be efficient for the dataset which could be classified as binary tree.
Survey on Email Classification Techniques/Algorithms
Journal of emerging technologies and innovative research, 2015
In lot of communication e-mail plays important role. E-mail system is used for communication in all type of organizations. It is self-evident that e-mail has become a central means for the discussion of engineering work and sharing of digital assets that define the product and its production process. Engineering communication research has shown that the volume of communication is indicative of progress being made within an engineering project. So that e-mail conversations increases as product grows and data in communication also increases. It get difficult to handle the data at emails. So need of classification of emails. Here we have studied different classification techniques which help us to classify the large email data. Index-TermsEmailclassification, Filtering, Structured and unstructured data, Naïve Bayes ________________________________________________________________________________________________________
A Review of Text Classification Approaches for E-mail Management
ijetch.org
AbstractThe continuing explosive growth of textual content within the World Wide Web has given rise to the need for sophisticated Text Classification (TC) techniques that combine efficiency with high quality of results. E-mail filtering and email organization is an ...
Email Classification Research Trends: Review and Open Issues
IEEE Access, 2017
Personal and business users prefer to use e-mail as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile applications, and social networks. As the volume of business-critical e-mails continues to grow, the need to automate the management of e-mails increases for several reasons, such as spam e-mail classification, phishing e-mail classification, and multi-folder categorization, among others. This paper comprehensively reviews articles on e-mail classification published in 2006-2016 by exploiting the methodological decision analysis in five aspects, namely, e-mail classification application areas, data sets used in each application area, feature space utilized in each application area, e-mail classification techniques, and the use of performance measures. A total of 98 articles (56 articles from Web of Science core collection databases and 42 articles from Scopus database) are selected. To achieve the objective of the study, a comprehensive review and analysis is conducted to explore the various areas where e-mail classification was applied. Moreover, various public data sets, features sets, classification techniques, and performance measures are examined and used in each identified application area. This review identifies five application areas of e-mail classification. The most widely used data sets, features sets, classification techniques, and performance measures are found in the identified application areas. The extensive use of these popular data sets, features sets, classification techniques, and performance measures is discussed and justified. The research directions, research challenges, and open issues in the field of e-mail classification are also presented for future researchers.
Phrases and feature selection in e-mail classification
2004
In this paper we study the effectiveness of using a phrase-based representation in e-mail classification, and the affect this approach has on a number of machine learning algorithms. We also evaluate various feature selection methods and reduction levels for the bag-of-words representation on several learning algorithms and corpora. The results show that the phrasebased representation and feature selection methods can be used to increase the performance of e-mail classifiers.
Developing a Reliable System for Real-Life Emails Classification Using Machine Learning Approach
Intelligent Systems and Networks, 2021
Cyber World has become accessible, public, and commonly used to distribute and exchange messages between malicious actors, terrorists, and illegally motivated persons. Electronic mail is one of the most frequently used transfers of information on internet media. E-mails are the most important digital proof that courts in various countries and communities use to condemn and that enables researchers to work continually to improve e-mail analysis using state-of-the-art technology to find digital evidence from e-mails. This work introduces a distinctive technology to analyze emails. It is based on consecutive phases, starting with data processing, extraction, compilation, then implementing the SWARM algorithm to adjust the output and to transfer these electronic mails for realistic and precise results by adjusting the support algorithm of vector machines. For email forensic analysis this system includes all the sentiment terms plus positives and negative cases. It can deal with the machine learning algorithm (Sent WordNet 3.0). Enron Data set is used to test the proposed framework. In the best case, a high accuracy rate is 92%.
An Automation Technique for Email Classification
International Journal of Advance Research and Innovative Ideas in Education, 2018
The email categorization has been proposed using Naive Bayes classification algorithm. .The categorization is based on not only the body but also the header of an email mes- sage. The metadata provide additional information that can be exploited and improve the categorization capability. Results of experiments on real email data demonstrate the feasibility of our approach.Results of system on real email data categorized into three types i.e. Primary, social and shopping. In particular, categorization based only on the header information is compara- ble or superior to that based on all the information in a message. The email communication becomes prevalent, all kinds of emails are generated.To classify emails for better visual representation and easy access to high priority important mails. The internal communications department of a company distributes an email message to all employees to remind the deadline of timecard submission.