Email Classification Research Trends: Review and Open Issues (original) (raw)

Personal and business users prefer to use e-mail as one of the crucial sources of communication. The usage and importance of e-mails continuously grow despite the prevalence of alternative means, such as electronic messages, mobile applications, and social networks. As the volume of business-critical e-mails continues to grow, the need to automate the management of e-mails increases for several reasons, such as spam e-mail classification, phishing e-mail classification, and multi-folder categorization, among others. This paper comprehensively reviews articles on e-mail classification published in 2006-2016 by exploiting the methodological decision analysis in five aspects, namely, e-mail classification application areas, data sets used in each application area, feature space utilized in each application area, e-mail classification techniques, and the use of performance measures. A total of 98 articles (56 articles from Web of Science core collection databases and 42 articles from Scopus database) are selected. To achieve the objective of the study, a comprehensive review and analysis is conducted to explore the various areas where e-mail classification was applied. Moreover, various public data sets, features sets, classification techniques, and performance measures are examined and used in each identified application area. This review identifies five application areas of e-mail classification. The most widely used data sets, features sets, classification techniques, and performance measures are found in the identified application areas. The extensive use of these popular data sets, features sets, classification techniques, and performance measures is discussed and justified. The research directions, research challenges, and open issues in the field of e-mail classification are also presented for future researchers.