A Three-Step Preprocessing Algorithm for Minimizing E-Mail Document’s Atypical Characteristics (original) (raw)
Abstract
Documents that are widely in use today included many atypical characteristics. In particular, non-standardization appears more frequently in e-mail documents than other documents due to the extensive use of informal expressions such as slang and abbreviation. Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier’s performance. We suggest a three-step preprocessing algorithm by stages for accurate automatic classification for each e-mail category. This research identifies e-mail document’s characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document’s atypical characteristics.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
- Jeong, O.-R., Cho, D.-S.: A Personalized Recommendation Agent System for E-mail Document Classification. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3045, pp. 558–565. Springer, Heidelberg (2004)
Chapter Google Scholar - Lewis, D.D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th international Conference on Machine Learning, pp. 148–156 (1994)
Google Scholar - Mitchell, T.M.: Machine Learning. Kluwer Academic Publishers, Dordrecht (1997)
MATH Google Scholar - Trensh, M., Palmer, N., Luniewski, A.: Type Classification of Semi-structured Documents. In: Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, pp. 263–274 (1995)
Google Scholar
Author information
Authors and Affiliations
- Department of Computer Science and Engineering, Ewha Womans University, 11-1 Daehyun-dong, Seodaemun-ku, Seoul, 120-750, Korea
Ok-Ran Jeong & Dong-Sub Cho
Authors
- Ok-Ran Jeong
- Dong-Sub Cho
Editor information
Editors and Affiliations
- School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang - Honda Research Institute Europe GmbH, Offenbach/Main, Germany
Yaochu Jin
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jeong, OR., Cho, DS. (2005). A Three-Step Preprocessing Algorithm for Minimizing E-Mail Document’s Atypical Characteristics. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007\_68
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/11540007\_68
- Publisher Name: Springer, Berlin, Heidelberg
- Print ISBN: 978-3-540-28331-7
- Online ISBN: 978-3-540-31828-6
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings Computer Science
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.