A Three-Step Preprocessing Algorithm for Minimizing E-Mail Document’s Atypical Characteristics (original) (raw)

Abstract

Documents that are widely in use today included many atypical characteristics. In particular, non-standardization appears more frequently in e-mail documents than other documents due to the extensive use of informal expressions such as slang and abbreviation. Automatic document classification may differ significantly according to the characteristics of documents that are subject to classification, as well as classifier’s performance. We suggest a three-step preprocessing algorithm by stages for accurate automatic classification for each e-mail category. This research identifies e-mail document’s characteristics to apply a three-step preprocessing algorithm that can minimize e-mail document’s atypical characteristics.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Jeong, O.-R., Cho, D.-S.: A Personalized Recommendation Agent System for E-mail Document Classification. In: Laganá, A., Gavrilova, M.L., Kumar, V., Mun, Y., Tan, C.J.K., Gervasi, O. (eds.) ICCSA 2004. LNCS, vol. 3045, pp. 558–565. Springer, Heidelberg (2004)
    Chapter Google Scholar
  2. Lewis, D.D., Catlett, J.: Heterogeneous Uncertainty Sampling for Supervised Learning. In: Proceedings of the 11th international Conference on Machine Learning, pp. 148–156 (1994)
    Google Scholar
  3. Mitchell, T.M.: Machine Learning. Kluwer Academic Publishers, Dordrecht (1997)
    MATH Google Scholar
  4. Trensh, M., Palmer, N., Luniewski, A.: Type Classification of Semi-structured Documents. In: Proceedings of the 21st ACM SIGMOD International Conference on Management of Data, pp. 263–274 (1995)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science and Engineering, Ewha Womans University, 11-1 Daehyun-dong, Seodaemun-ku, Seoul, 120-750, Korea
    Ok-Ran Jeong & Dong-Sub Cho

Authors

  1. Ok-Ran Jeong
  2. Dong-Sub Cho

Editor information

Editors and Affiliations

  1. School of Electrical and Electronic Engineering, Nanyang Technological University, Block S1, Nanyang Avenue, 639798, Singapore
    Lipo Wang
  2. Honda Research Institute Europe GmbH, Offenbach/Main, Germany
    Yaochu Jin

Rights and permissions

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jeong, OR., Cho, DS. (2005). A Three-Step Preprocessing Algorithm for Minimizing E-Mail Document’s Atypical Characteristics. In: Wang, L., Jin, Y. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2005. Lecture Notes in Computer Science(), vol 3614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11540007\_68

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us