Intelligent spam classification for mobile text message (original) (raw)

Abstract

This paper analyses the methods of intelligent spam filtering techniques in the SMS (Short Message Service) text paradigm, in the context of mobile text message spam. The unique characteristics of the SMS contents are indicative of the fact that all approaches may not be equally effective or efficient. This paper compares some of the popular spam filtering techniques on a publically available SMS spam corpus, to identify the methods that work best in the SMS text context. This can give hints on optimized spam detection for mobile text messages.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (22)

  1. Paul Graham, (August 2002), A plan for spam, viewed: 28 September 2011, http://paulgraham.com/spam.html
  2. Duan, L., Li, N., & Huang, L. (2009). "A new spam short message classification" 2009 First International Workshop on Education Technology and Computer Science, 168-171.
  3. Zhang, H.-yan, & Wang, W. (2009). "Application of Bayesian method to spam sms filtering". 2009 International Conference on Information Engineering and Computer Science, 1-3.
  4. Rick L. Allison, & Peter J. Marsico, US Patent Document -6819932 "Methods and systems for preventing delivery of unwanted short message service (SMS) messages", (Nov 2004).
  5. Freund, Y., Schapire, R. E., & Hill, M. (1996). "Experiments with a new boosting algorithm". Thirteenth International Conference on Machine Learning, San Francisco, 148-156
  6. Freund, Y., & Schapire, R. E. (1998). Large margin classification using the perceptron algorithm. Proceedings of the eleventh annual conference on Computational learning theory -COLT' 98, 296, 209-217.
  7. Mccallum, A., & Nigam, K. (1998). "A comparison of event models for naive Bayes text classification". AAAI-98 Workshop on 'Learning for Text Categorization'
  8. Liu, J., Ke, H., & Zhang, G. (2010). "Real-time sms filtering system based on bm algorithm". System, 6-8.
  9. Wang, C et. all (2010), "A behavior-based SMS antispam system", IBM Journal of Research and Development, 3:1 -3:16
  10. Shirali-Shahreza, M. H., & Shirali-Shahreza, M. (2008). "An anti-sms- spam using CAPTCHA". 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, 318-321.
  11. He, P., Sun, Y., Zheng, W., & Wen, X. (2008). "Filtering short message spam of group sending using CAPTCHA". First International Workshop on Knowledge Discovery and Data Mining (WKDD 2008), 558-561.
  12. He, P. (2008). "A Novel Method for Filtering Group Sending Short Message Spam". Proofs, 60-65.
  13. Cai, J., Tang, Y., & Hu, R. (2008). "Spam filter for short messages using winnow". 2008 International Conference on Advanced Language Processing and Web Information Technology, 454-459.
  14. SMS Spam Collection v.1, viewed: 2011 August 9 <www.dt.fee.unicamp.br/\~tiago/SMSspamcollection>
  15. Weka The University of Waikato, Weka 3: Data Mining Software in Java, viewed on 2011 September 14 http://www.cs.waikato.ac.nz/ml/weka/
  16. Cormack, G. V., Hidalgo, J. M. G., & Sánz, E. P. (2007). "Feature engineering for mobile (SMS) spam filtering". Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval -SIGIR '07, 871.
  17. Su, J., Zhang, H., Ling, C. X., & Matwin, S. (2008). "Discriminative parameter learning for Bayesian networks". Proceedings of the 25th international conference on Machine learning -ICML '08, 1016-1023.
  18. Bayesian Network Classifiers in Weka, viewed on 2011 September 14 http://www.cs.waikato.ac.nz/\~remco/weka.bn.pdf,
  19. Cleary, J. G., & Trigg, L. E. (n.d.). K *: "An Instance-based Learner Using an Entropic Distance Measure", 12th International Conference on Machine Learning, 108-114.
  20. Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA
  21. D. Aha, D. Kibler, M Albert (1991). "Instance-based learning algorithms", Machine Learning, Kluwer Academic Publishers, 6:37-66.
  22. Alexander Genkin, David D. Lewis, David Madigan (2004). "Large- scale Bayesian logistic regression for text categorization", Technometrics. August 1, 2007, 49(3): 291-304