Feature selection for text classification: A review (original) (raw)
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional spaces. In: ICDT, vol 1. Springer, pp 420–434
Apté C, Damerau F, Weiss SM (1994) Automated learning of decision rules for text categorization. ACM Trans Inf Syst 12(3):233–251 Article Google Scholar
Aslam JAMF (2003) An information-theoretic measure for document similarity. In: Proceedings of ACM SIGIR, pp 449–450
Baccianella S, Esuli A, Sebastiani F (2014) Feature selection for ordinal text classification. Neural Comput 26(3):557–591 ArticleMathSciNet Google Scholar
Baecchi C, Uricchio T, Bertini M, Del Bimbo A (2016) A multimodal feature learning approach for sentiment analysis of social network multimedia. Multimed Tool Appl 75(5):2507–2525 Article Google Scholar
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM, New York Google Scholar
Ballan L, Bertini M, Uricchio T, Del Bimbo A (2015) Data-driven approaches for social image and video tagging. Multimed Tool Appl 74(4):1443–1468 Article Google Scholar
Basu T, Murthy C (2016) A supervised term selection technique for effective text categorization. Int J Mach Learn Cybern 7(5):877–892 Article Google Scholar
Bermejo P, de la Ossa L, Gámez JA, Puerta JM (2012) Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking. Knowl-Based Syst 25(1):35–44 Article Google Scholar
Brown G (2009) A new perspective for information theoretic feature selection. In: Artificial intelligence and statistics, pp 49–56
Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28 Article Google Scholar
Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with naïve bayes. Expert Syst Appl 3(36):5432–5435 Article Google Scholar
Choi SS, Cha SH, Tappert CC (2010) A survey of binary similarity and distance measures. J Syst Cybern Inform 8(1):43–48 Google Scholar
Chou CH, Sinha AP, Zhao H (2010) A hybrid attribute selection approach for text classification. J Assoc Inf Syst 11(9):491 Google Scholar
Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123
Combarro EF, Montanes E, Diaz I, Ranilla J, Mones R (2005) Introducing a family of linear measures for feature selection in text categorization. IEEE Trans Knowl Data Eng 17(9):1223–1232 Article Google Scholar
Cunningham P, Delany SJ (2007) k-nearest neighbour classifiers. Multiple Class Syst 34:1–17 Google Scholar
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: ICML, vol 1, pp 74–81
Dasgupta A, Drineas P, Harb B, Josifovski V, Mahoney MW (2007) Feature selection methods for text classification. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, mining. ACM, pp 230–239
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(1-4):131–156 Article Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Amer Soc Inform Sci 41(6):391 Article Google Scholar
Domingos P, Pazzani M (1997) On the optimality of the simple bayesian classifier under zero-one loss. Mach Learn 29(2/3):103–130 ArticleMATH Google Scholar
Dumais S, Platt J, Heckerman D, Sahami M (1998) Inductive learning algorithms and representations for text categorization. In: Proceedings of the seventh international conference on Information and knowledge management. ACM, pp 148–155
Dy JG, Brodley CE (2004) Feature selection for unsupervised learning. J Mach Learn Res 5:845– 889 MathSciNetMATH Google Scholar
Fang Y, Zhang J, Zhang S, Lei C, Hu X (2017) Supervised feature selection algorithm based on low-rank and manifold learning. In: Proceedings of the 13th international conference on advanced data mining and applications, ADMA 2017. Singapore, pp 273–286
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305 MATH Google Scholar
Forman G (2004) A pitfall and solution in multi-class feature selection for text classification. In: Proceedings of the 21st international conference on machine learning. ACM, p 38
Fragoudis D, Meretakis D, Likothanassis S (2005) Best terms: an efficient feature-selection algorithm for text categorization. Knowl Inf Syst 8(1):16–33 Article Google Scholar
Fu AY, Wenyin L, Deng X (2006) Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (emd). IEEE Trans Dependable Secure Comput 3(4)
Galavotti L, Sebastiani F, Simi M (2000) Experiments on the use of feature selection and negative evidence in automated text categorization. In: International conference on theory and practice of digital libraries. Springer, pp 59–68
Gao B, Liu TY, Feng G, Qin T, Cheng QS, Ma WY (2005) Hierarchical taxonomy preparation for text categorization using consistent bipartite spectral graph copartitioning. IEEE Trans Knowl Data Eng 17(9):1263–1273 Article Google Scholar
Ghareb AS, Bakar AA, Hamdan AR (2016) Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst Appl 49:31–47 Article Google Scholar
Günal S (2012) Hybrid feature selection for text classification. Turkish J Electr Eng Comput Sci 20(2):1296–1311 MathSciNet Google Scholar
Gutlein M, Frank E, Hall M, Karwath A (2009) Large-scale attribute selection using wrappers. In: IEEE symposium on computational intelligence and data mining, 2009. CIDM’09. IEEE, pp 332–339
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(3):1157–1182 MATH Google Scholar
He X, Cai D, Niyogi P (2006) Laplacian score for feature selection. In: Advances in neural information processing systems, pp 507–514
Hinneburg A, Aggarwal CC, Keim DA (2000) What is the nearest neighbor in high dimensional spaces? In: 26th international conference on very large databases, pp 506–515
Hu R, Cheng D, He W, Wen G, Zhu Y, Zhang J, Zhang S (2017) Low-rank feature selection for multi-view regression. Multimed Tool Appl 76 (16):17,479–17,495 Article Google Scholar
Hu R, Zhu X, Cheng D, He W, Yan Y, Song J, Zhang S (2017) Graph self-representation method for unsupervised feature selection. Neurocomputing 220:130–137 Article Google Scholar
Huang A (2008) Similarity measures for text document clustering. In: NZCSRSC, pp 49–56
Jain A, Zongker D (1997) Feature selection: evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158 Article Google Scholar
Jian L, Li J, Shu K, Liu H (2016) Multi-label informed feature selection. In: IJCAI, pp 1627–1633
Joachims T (1998) Text categorization with support vector machines: Learning with many relevant features. Mach Learn: ECML-98:137–142
John GH, Kohavi R, Pfleger K et al. (1994) Irrelevant features and the subset selection problem. In: Machine learning: proceedings of the eleventh international conference, pp 121–129
Johnson R, Zhang T (2014) Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116 Article Google Scholar
Kim SB, Han KS, Rim HC, Myaeng SH (2006) Some effective techniques for naive bayes text classification. IEEE Trans Knowl Data Eng 18(11):1457–1466 Article Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324 ArticleMATH Google Scholar
Lam SL, Lee DL (1999) Feature reduction for neural network based text categorization. In: Proceedings of the 6th international conference on database systems for advanced applications, 1999. IEEE, pp 195–202
Largeron C, Moulin C, Géry M. (2011) Entropy based feature selection for text categorization. In: Proceedings of the 2011 ACM symposium on applied computing. ACM, pp 924–928
Levandowsky M, Winter D (1971) Distance between sets. Nature 234 (5323):34–35 Article Google Scholar
Lewis DD, Ringuette M (1994) A comparison of two learning algorithms for text categorization. In: 3rd annual symposium on document analysis and information retrieval, vol 33, pp 81–93
Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of international conference on machine learning, vol 98, pp 29,633–304
Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T (2011) Large-scale image classification: fast feature extraction and svm training. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1689–1696
Lin YS, Jiang JY, Lee SJ (2014) A similarity measure for text classification and clustering. IEEE Trans Knowl Data Eng 26(7):1575–1590 Article Google Scholar
Liu H, Setiono R (1997) Feature selection and classification-a probabilistic wrapper approach. In: Proceedings of the 9th international conference on industrial and engineering applications of AI and ES, pp 419–424
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502 Article Google Scholar
Ma Z, Nie F, Yang Y, Uijlings JR, Sebe N (2012) Web image annotation via subspace-sparsity collaborated feature selection. IEEE Trans Multim 14(4):1021–1030 Article Google Scholar
Manku GS, Jain A, Das Sarma A (2007) Detecting near-duplicates for web crawling. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 141–150
McCallum A, Nigam K (1998) Employing em in poll-based active learning for text classification. In: Proceedings of the 15th international conference on machine learning, pp 350–358
McCallum A, Nigam K et al. (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, vol 752. Madison, WI, pp 41–48
Mladenić D (1998) Feature subset selection in text-learning. In: European conference on machine learning, pp 95–100. Springer
Molina LC, Belanche L, Nebot À (2002) Feature selection algorithms: a survey and experimental evaluation. In: Proceedings of 2002 IEEE international conference on data mining, 2002. ICDM 2003. IEEE, pp 306–313
Ng HT, Goh WB, Low KL (1997) Feature selection, perceptron learning, and a usability case study for text categorization. In: ACM SIGIR forum, vol 31. ACM, pp 67–73
Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26(11):1424–1437 Article Google Scholar
Pappas N, Popescu-Belis A (2015) Combining content with user preferences for non-fiction multimedia recommendation: a study on ted lectures. Multi Tools Appl 74(4):1175–1197 Article Google Scholar
Pietramala A, Policicchio VL, Rullo P, Sidhu I (2008) A genetic algorithm for text classification rule induction. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 188–203
Pudil P, Novovičová J, Kittler J (1994) Floating search methods in feature selection. Pattern Recogn Lett 15(11):1119–1125 Article Google Scholar
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106 Google Scholar
Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier
Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th ACM SIGIR conference. ACM, pp 232–241
Rocchio JJ (1971) Relevance feedback in information retrieval. The Smart retrieval system-experiments in automatic document processing
Rogati M, Yang Y (2002) High-performing feature selection for text classification. In: Proceedings of the eleventh international conference on information and knowledge management. ACM, pp 659–661
Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99–121 ArticleMATH Google Scholar
Ruiz ME, Srinivasan P (2002) Hierarchical text categorization using neural networks. Inf Retr 5(1):87–118 ArticleMATH Google Scholar
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36
Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 229–237
Scott S, Matwin S (1999) Feature engineering for text classification. In: ICML, vol 99, pp 379–388
Sebastiani F (2002) Machine learning in automated text cateogirzation. ACM Comput Surv 34(1):1–47 ArticleMathSciNet Google Scholar
Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14 Article Google Scholar
Strehl A, Ghosh J, Mooney R (2000) Impact of similarity measures on web-page clustering. In: Workshop on artificial intelligence for web search (AAAI 2000), vol 58, p 64
Strehl AJG (2000) Value-based customer grouping from large retail data-sets. In: Proceedings of SPIE, vol 4057, pp 33–42
Susana E, David M (2005) A novel feature selection score for text categorization. In: Proceedings of the workshop on feature selection for data mining, in conjunction with the 2005 SIAM international conference on data mining. SIAM, pp 1–8
Taira H, Haruno M (1999) Feature selection in svm text categorization. In: AAAI/ IAAI, pp 480–486
Tang B, Kay S, He H (2016) Toward optimal feature selection in naive bayes for text categorization. IEEE Trans Knowl Data Eng 28(9):2508–2521 Article Google Scholar
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. Data Classification: Algorithms and Applications, p 37
Uuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032 Article Google Scholar
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92 Article Google Scholar
Uysal AK, Gunal S (2012) A novel probabilistic feature selection method for text classification. Knowl-Based Syst 36:226–235 Article Google Scholar
Uysal AK, Gunal S (2014) Text classification using genetic algorithm oriented latent semantic features. Expert Syst Appl 41(13):5938–5947 Article Google Scholar
Vapnik VN, Vapnik V (1998) Statistical learning theory, vol 1. Wiley, New York MATH Google Scholar
Vergara JR, Estévez PA (2014) A review of feature selection methods based on mutual information. Neural Comput Appl 24(1):175–186 Article Google Scholar
Wan X (2007) A novel document similarity measure based on earth mover’s distance. Inf Sci 177(18):3718–3730 Article Google Scholar
Wan X, Peng Y (2005) The earth mover’s distance as a semantic measure for document similarity. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 301–302
Wang D, Zhang H, Liu R, Lv W, Wang D (2014) t-test feature selection approach based on term frequency for text categorization. Pattern Recogn Lett 45:1–10 Article Google Scholar
Wang J, Zhao P, Hoi SC, Jin R (2014) Online feature selection and its applications. IEEE Trans Knowl Data Eng 26(3):698–710 Article Google Scholar
Weber R, Schek HJ, Blott S (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, vol 98, pp 194–205
Weston J, Mukherjee S, Chapelle O, Pontil M, Poggio T, Vapnik V (2001) Feature selection for svms. In: Advances in neural information processing systems, pp 668–674
Wiener E, Pedersen JO, Weigend AS et al. (1995) A neural network approach to topic spotting. In: Proceedings of SDAIR-95, 4th annual symposium on document analysis and information retrieval, vol 317. Las Vegas, NV, p 332
Wu X, Yu K, Ding W, Wang H, Zhu X (2013) Online feature selection with streaming features. IEEE Trans Pattern Anal Mach Intell 35(5):1178–1192 Article Google Scholar
Xing EP, Jordan MI, Karp RM et al. (2001) Feature selection for high-dimensional genomic microarray data. In: ICML, vol 1, pp 601–608
Yan J, Liu N, Zhang B, Yan S, Chen Z, Cheng Q, Fan W, Ma WY (2005) Ocfs: optimal orthogonal centroid feature selection for text categorization. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 122–129
Yang Y (1999) An evaluation of statistical approaches to text categorization. Inf Retr 1(1):69–90 Article Google Scholar
Yang Y, Chute CG (1994) An example-based mapping method for text categorization and retrieval. ACM Trans Inf Syst 12(3):252–277 Article Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 97, pp 412–420
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd ACM SIGIR, pp 42–49
Yang J, Liu Y, Zhu X, Liu Z, Zhang X (2012) A new feature selection based on comprehensive measurement both in inter-category and intra-category for text categorization. Inf Process Manag 48(4):741–754 Article Google Scholar
Zhang S, Li X, Zong M, Zhu X, Wang R (2017) Efficient knn classification with different numbers of nearest neighbors. IEEE transactions on neural networks and learning systems. https://doi.org/10.1109/TNNLS.2017.2673241
Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632 Article Google Scholar
Zhao S, Yao H, Zhao S, Jiang X, Jiang X (2016) Multi-modal microblog classification via multi-task learning. Multimed Tools Appl 75(15):8921–8938 Article Google Scholar
Zheng Z, Wu X, Srihari R (2004) Feature selection for text categorization on imbalanced data. ACM Sigkdd Explorations Newsletter 6(1):80–89 Article Google Scholar
Zhu X, Zhang S, Jin Z, Zhang Z, Xu Z (2011) Missing value estimation for mixed-attribute data sets. IEEE Trans Knowl Data Eng 23(1):110–121 Article Google Scholar
Zhu X, Zhang L, Huang Z (2014) A sparse embedding and least variance encoding approach to hashing. IEEE Trans Image Process 23(9):3737–3750 ArticleMathSciNetMATH Google Scholar
Zhu X, Li X, Zhang S (2016) Block-row sparse multiview multilabel learning for image classification. IEEE Trans Cybern 46(2):450–461 Article Google Scholar
Zhu X, Li X, Zhang S, Ju C, Wu X (2017) Robust joint graph sparse coding for unsupervised spectral feature selection. IEEE Trans Neural Netw Learn Syst 28(6):1263–1275 ArticleMathSciNet Google Scholar
Zhu X, Li X, Zhang S, Xu Z, Yu L, Wang C (2017) Graph pca hashing for similarity search. IEEE Trans Multimed 19(9):2033–2044 Article Google Scholar
Zhu X, Suk H, Wang L, Lee S, Shen D (2017) A novel relational regularization feature selection method for joint regression and classification in AD diagnosis. Med Image Anal 38:205–214 Article Google Scholar
Zhu X, Suk HI, Huang H, Shen D (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Trans Big Data 3(4):405–414 Article Google Scholar
Zhu X, Zhang S, Hu R, Zhu Y et al (2018) Local and global structure preservation for robust unsupervised spectral feature selection. IEEE Trans Knowl Data Eng 30(3):517–529 Article Google Scholar