Manipulation of Email Data Using Machine Learning and Data Visualization (original) (raw)

Email classification analysis using machine learning techniques

Applied computing and informatics, 2022

Purpose-In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed. Design/methodology/approach-Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used. Findings-The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers. Originality/value-In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Developing a Reliable System for Real-Life Emails Classification Using Machine Learning Approach

Intelligent Systems and Networks, 2021

Cyber World has become accessible, public, and commonly used to distribute and exchange messages between malicious actors, terrorists, and illegally motivated persons. Electronic mail is one of the most frequently used transfers of information on internet media. E-mails are the most important digital proof that courts in various countries and communities use to condemn and that enables researchers to work continually to improve e-mail analysis using state-of-the-art technology to find digital evidence from e-mails. This work introduces a distinctive technology to analyze emails. It is based on consecutive phases, starting with data processing, extraction, compilation, then implementing the SWARM algorithm to adjust the output and to transfer these electronic mails for realistic and precise results by adjusting the support algorithm of vector machines. For email forensic analysis this system includes all the sentiment terms plus positives and negative cases. It can deal with the machine learning algorithm (Sent WordNet 3.0). Enron Data set is used to test the proposed framework. In the best case, a high accuracy rate is 92%.

Machine Learning Based Classification for Spam Detection

Sakarya Üniversitesi Fen Bilimleri Enstitüsü dergisi/Sakarya Üniversitesi fen bilimleri enstitüsü dergisi, 2023

Electronic Electronic messages, i.e. e-mails, are a communication tool frequently used by individuals or organizations. While e-mail is extremely practical to use, it is necessary to consider its vulnerabilities. Spam e-mails are unsolicited messages created to promote a product or service, often sent frequently. It is very important to classify incoming e-mails in order to protect against malware that can be transmitted via e-mail and to reduce possible unwanted consequences. Spam email classification is the process of identifying and distinguishing spam emails from legitimate emails. This classification can be done through various methods such as keyword filtering, machine learning algorithms and image recognition. The goal of spam email classification is to prevent unwanted and potentially harmful emails from reaching the user's inbox. In this study, Random Forest (RF), Logistic Regression (LR), Naive Bayes (NB), Support Vector Machine (SVM) and Artificial Neural Network (ANN) algorithms are used to classify spam emails and the results are compared. Algorithms with different approaches were used to determine the best solution for the problem. 5558 spam and non-spam e-mails were analyzed and the performance of the algorithms was reported in terms of accuracy, precision, sensitivity and F1-Score metrics. The most successful result was obtained with the RF algorithm with an accuracy of 98.83%. In this study, high success was achieved by classifying spam emails with machine learning algorithms. In addition, it has been proved by experimental studies that better results are obtained than similar studies in the literature. 1. Introduction With the widespread use of the Internet, electronic communication has become more preferred. One of the most important tools of electronic communication is electronic messages, which we call e-mail. Today, individuals or organizations have one or more email accounts. Instant delivery of messages, no cost and ease of use increase the importance and prevalence of e-mail [1]. According to Statista Research Department data, the number of actively used e-mail accounts in 2020 is more than 4 billion. This number is estimated to increase to 4.6 billion in 2025. In 2020, 306 billion e-mails are sent and received every day, and this number is expected to exceed 376 billion in 2025 [2]. The use of e-mail is not only practical but also has various vulnerabilities. The e-mail account to be hijacked in various ways, for e-mails containing advertisements etc. to hijack your computer by installing a software on your computer when you click on the advertisement, and for the installed software to disrupt communication by sometimes filling the

A Comparative Study for Email Classification

Email has become one of the fastest and most economical forms of communication. However, the increase of email users have resulted in the dramatic increase of spam emails during the past few years. In this paper, email data was classified using four different classifiers (Neural Network, SVM classifier, Naïve Bayesian Classifier, and J48 classifier). The experiment was performed based on different data size and different feature size. The final classification result should be '1' if it is finally spam, otherwise, it should be '0'. This paper shows that simple J48 classifier which make a binary tree, could be efficient for the dataset which could be classified as binary tree.

Analysis of Spam Messages Using Various Machine Learning Classifier

Background: As people using social media increases the data generation also increases and the data generated may be safe or unsafe. If we see some applications like Twitter and mail. We get a lot of emails or twits that include all dangerous and useful things. Here to be safe from the threats and dangers we need a filter that separates useful messages from spam and helps us not to drown in a trap. And one of the approaches to do this is explained in this paper. In this paper, the algorithm followed is the Naïve Bayes classifier. This also provides the comparison between using Naïve Bayes, KNN, and Logistic Regression to solve the same problem that is spam filtering and term frequency-inverse document frequency (TFIDF).

Analysis of Machine Learning Algorithms for Email Classification Using NLP

In the exponentially growing world, people are using email across all areas of industries including Educational field. Therefore, it is very much important to differentiate between legitimate and spam email. In this paper, we have preprocessed emails using natural language processing and applied several machine-learning algorithms to analyze their performance on email classification. The performance observed here is accuracy and F1 score. The result shows that ANN outperforms the other algorithms. The ANN best accuracy is 98.80% and F1 score is 0.977778. Keywords—Natural Language processing; Machine learing; spam classification; emails

Efficient Spam Email Classification using Machine Learning Algorithms

In today's digital age, since email is the main form of communication, the identification of email spam is a critical issue. In addition to consuming a lot of time and money, email spam is also a security and privacy risk. In this paper, we provide a means for email spam detection that employes machine learning Algorithms. The required features for training the ML models have been engineered after analysis of the email dataset of contentbased filtering obtained from Kaggle website. We tested a Several types of algorithms for machine learning and analyzed their level of performance using the dataset. Our findings demonstrate how effective is the suggested approach in identifying email spam with highest accuracy of 99.8% and Rmse of 0.2 .Here we applied , the various ML classifier algorithm such as Decision tree , Voting Classifier , Random Forest, Logistic Regression and so on to our dataset ,compared among each other and found which suits best for the dataset with the highest accuracy. This method can be useful in email clients or servers to detect spam emails automatically and enhanced

Comparative Study On Supervised Machine Learning Algorithms For Spam Mail Detection

International Journal of Scientific & Technology Research, 2020

Electronic mail (E-mail) is used to exchange messages between people via internet. E-mail protocols like Simple Mail Transfer Protocol (SMTP), POP (Post Office Protocol) and IMAP (Internet Message Access Protocol) are used to transfer messages from sender to receiver. Due to the flaws in E-mail protocols, development of online businesses and advertisement companies create E-mail based intimidation. E-mail spam is called as junk mail. Today handling spam mail is one of the major problems in software companies. Since spam mail causes traffic problems and bottle necks that limit memory space, computing power and speed. And also a user has to spend more time to detect and obliterate spam mails. Machine learning models are used to are used to overcome this problem. Machine learning models are categorized into supervised, unsupervised and semi supervised learning models. Supervised learning models are used to classify E-mails, filter and prevent the spam mail. The proposed work explores t...

IJERT-Email Spam Detection and Data Optimization using NLP Techniques

International Journal of Engineering Research and Technology (IJERT), 2021

https://www.ijert.org/email-spam-detection-and-data-optimization-using-nlp-techniques https://www.ijert.org/research/email-spam-detection-and-data-optimization-using-nlp-techniques-IJERTV10IS080049.pdf Today, spamming mails is one of the biggest issues faced by everyone in the world of the Internet. In such a world, email is mostly shared by everyone to share the information and files because of their easy way of communication and for their low cost. But such emails are mostly affecting the professionals as well as individuals by the way of sending spam emails. Every day, the rate of spam emails and spam messages is increasing. Such spam emails are mostly sent by people to earn income or for any advertisement for their benefit. This increasing amount of spam mail causes traffic congestion and waste of time for those who are receiving that spam mail. The real cost of spam emails is very much higher than one can imagine. Sometimes, the spam emails also have some links which have malware. And also, some people will get irritated once they see their inbox which is having more spam mails. Sometimes, the users easily get trapped into financial fraud actions, by seeing the spam mails such as job alert mails and commercial mails and offer emails. It may also cause the person to have some mental stress. To reduce all these risks, the system has proposed a machine learning model which will detect spam mail and non-spam emails, and also this system will optimize the data by removing the unwanted mails which contain the advertisement mails and also some useless emails and also some fraud mails. This proposed system will detect the spam mails and ham emails with the dataset consisting of spam mails and after identifying spam mails this system will remove that spam emails and this proposed system will calculate the amount of storage before and after the removal of spam mails.

Machine Learning-Based Detection of Spam Emails

Sci. Program., 2021

Social communication has evolved, with e-mail still being one of the most common communication means, used for both formal and informal ways. With many languages being digitized for the electronic world, the use of English is still abundant. However, various native languages of different regions are emerging gradually. The Urdu language, coming from South Asia, mostly Pakistan, is also getting its pace as a medium for communications used in social media platforms, websites, and emails. With the increased usage of emails, Urdu’s number and variety of spam content also increase. Spam emails are inappropriate and unwanted messages usually sent to breach security. These spam emails include phishing URLs, advertisements, commercial segments, and a large number of indiscriminate recipients. Thus, such content is always a hazard for the user, and many studies have taken place to detect such spam content. However, there is a dire need to detect spam emails, which have content written in Urd...