ONLINE EMAIL CLASSIFICATION USING ANT CLUSTERING (original) (raw)
Related papers
A Review of Text Classification Approaches for E-mail Management
ijetch.org
AbstractThe continuing explosive growth of textual content within the World Wide Web has given rise to the need for sophisticated Text Classification (TC) techniques that combine efficiency with high quality of results. E-mail filtering and email organization is an ...
An Automation Technique for Email Classification
International Journal of Advance Research and Innovative Ideas in Education, 2018
The email categorization has been proposed using Naive Bayes classification algorithm. .The categorization is based on not only the body but also the header of an email mes- sage. The metadata provide additional information that can be exploited and improve the categorization capability. Results of experiments on real email data demonstrate the feasibility of our approach.Results of system on real email data categorized into three types i.e. Primary, social and shopping. In particular, categorization based only on the header information is compara- ble or superior to that based on all the information in a message. The email communication becomes prevalent, all kinds of emails are generated.To classify emails for better visual representation and easy access to high priority important mails. The internal communications department of a company distributes an email message to all employees to remind the deadline of timecard submission.
Web Page Classification with an Ant Colony Algorithm
2004
This paper utilizes Ant-Miner – the first Ant Colony algorithm for discovering classification rules – in the field of web content mining, and shows that it is more effective than C5.0 in two sets of BBC and Yahoo web pages used in our experiments. It also investigates the benefits and dangers of several linguistics-based text preprocessing techniques to reduce the large numbers of attributes associated with web content mining.
Survey on Email Classification Techniques/Algorithms
Journal of emerging technologies and innovative research, 2015
In lot of communication e-mail plays important role. E-mail system is used for communication in all type of organizations. It is self-evident that e-mail has become a central means for the discussion of engineering work and sharing of digital assets that define the product and its production process. Engineering communication research has shown that the volume of communication is indicative of progress being made within an engineering project. So that e-mail conversations increases as product grows and data in communication also increases. It get difficult to handle the data at emails. So need of classification of emails. Here we have studied different classification techniques which help us to classify the large email data. Index-TermsEmailclassification, Filtering, Structured and unstructured data, Naïve Bayes ________________________________________________________________________________________________________
Email Categorization using Hybrid Supervised and Unsupervised Approach
2014
As with the use of internet, use of emails increases drastically for electronic communication. This leads the mail boxes gets congested and emerged the problem of email overload, which is solved with the help of email categorization or email management. Email Categorization is multifaceted problem with many difficulties. Many schemes have been proposed for solving this problem in either supervised or unsupervised approach. With that approach once categorization model is built, it is hard to make any changes to them for handling of dynamic situations. As email replicates current information around the globe, the email content will be changed with the passage of time. Concept drift is the situation which occurs due to changes in underlying data distribution over a time period. The problem of concept drift detection and handling will occur due to dynamic nature of email. This paper proposes the dynamic hybrid scheme, combines supervised and unsupervised approach for detection and handling of concept drift. Initial classifier is built with the help of classification algorithm, and then clustering algorithm is applied in 'General' category of classifier to detect concept drift.. If it is detected then new cluster is formed for that new emerging concept and appropriate label is assigned to that cluster.
An Ant Colony Optimization Based Feature Selection for Web Page Classification
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
Content Based E-Mail Classification
International Journal of Scientific Research in Science, Engineering and Technology, 2021
Electronic Mail (E-mail) has established a significant place in information user’s life. Mails are used as a major and important mode of information sharing because emails are faster and effective way of communication. Email plays its important role of communication in both personal and professional aspects of one’s life. The rapid increase in the number of account holders from last few decades and the increase in the volume of mails have generated various serious issues too. The content base mail classification can be classified into four ways namely Private, Public, Newsletter, and Anonymous. Every user has the right to choose their keyword (a semi-private password). Those contacts who know the user’s keyword will be classified as private contacts and those users who are unknown them classified anonymous contacts. A contact can be classified as public or private, upon verification of an anonymous contact. Any newsletter or group mails are classified into newsletter contacts. It is highly likely that the rests are junk mail or spam. In this project, a spam detector to identify an email as either spam or ham is built using n-gram analysis. The system involves the classification of mails based on user’s contacts. This way any mail from a contact whom the user knows very well is being displayed.
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
In today’s world of internet, with whole lot of e-documents such, as html pages, digital libraries etc. occupying considerable amount of cyber space, organizing these documents has become a practical need. Clustering is an important technique that organizes large number of objects into smaller coherent groups. This helps in efficient and effective use of these documents for information retrieval and other NLP tasks. Email is one of the most frequently used e-document by individual or organization. Email categorization is one of the major tasks of email mining. Categorizing emails into different groups help easy retrieval and maintenance. Like other e-documents, emails can also be classified using clustering algorithms. In this paper a similarity measure called Similarity Measure for Text Processing is suggested for email clustering. The suggested similarity measure takes into account three situations: feature appears in both emails, feature appears in only one email and feature appears in none of the emails. The potency of suggested similarity measure is analyzed on Enron email data set to categorize emails. The outcome indicates that the efficiency acquired by the suggested similarity measure is better than that acquired by other measures.
An experimental framework for email categorization and management
Proceedings of the 24th annual international ACM …, 2001
Many problems are difficult to adequately explore until a prototype exists in order to elicit user feedback. One such problem is a system that automatically categorizes and manages email. Due to a myriad of user interface issues, a prototype is necessary to determine what techniques and technologies are effective in the email domain. This paper describes the implementation of an add-in for Microsoft Outlook 2000 TM that intends to address two problems with email: 1) help manage the inbox by automatically classifying email based on user folders, and 2) to aid in search and retrieval by providing a list of email relevant to the selected item. This add-in represents a first step in an experimental system for the study of other issues related to information management. The system has been set up to allow experimentation with other classification algorithms and the source code is available online in an effort to promote further experimentation.
IJERT-Email Categorization using Hybrid Supervised and Unsupervised Approach
International Journal of Engineering Research and Technology (IJERT), 2014
https://www.ijert.org/email-categorization-using-hybrid-supervised-and-unsupervised-approach https://www.ijert.org/research/email-categorization-using-hybrid-supervised-and-unsupervised-approach-IJERTV3IS060795.pdf As with the use of internet, use of emails increases drastically for electronic communication. This leads the mail boxes gets congested and emerged the problem of email overload, which is solved with the help of email categorization or email management. Email Categorization is multifaceted problem with many difficulties. Many schemes have been proposed for solving this problem in either supervised or unsupervised approach. With that approach once categorization model is built, it is hard to make any changes to them for handling of dynamic situations. As email replicates current information around the globe, the email content will be changed with the passage of time. Concept drift is the situation which occurs due to changes in underlying data distribution over a time period. The problem of concept drift detection and handling will occur due to dynamic nature of email. This paper proposes the dynamic hybrid scheme, combines supervised and unsupervised approach for detection and handling of concept drift. Initial classifier is built with the help of classification algorithm, and then clustering algorithm is applied in 'General' category of classifier to detect concept drift.. If it is detected then new cluster is formed for that new emerging concept and appropriate label is assigned to that cluster.