Gene Ranking from Microarray Data for Cancer Classification–A Machine Learning Approach (original) (raw)

Abstract

Traditional gene selection methods often select the top–ranked genes according to their individual discriminative power. We propose to apply feature evaluation measure broadly used in the machine learning field and not so popular in the DNA microarray field. Besides, the application of sequential gene subset selection approaches is included. In our study, we propose some well-known criteria (filters and wrappers) to rank attributes, and a greedy search procedure combined with three subset evaluation measures. Two completely different machine learning classifiers are applied to perform the class prediction. The comparison is performed on two well–known DNA microarray data sets. We notice that most of the top-ranked genes appear in the list of relevant–informative genes detected by previous studies over these data sets.

This research was supported by the Spanish Research Agency CICYT under grants TIN2004–00159 and TIN2004-06689C0303.

Preview

Unable to display preview. Download preview PDF.

References

Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
Article Google Scholar
Ben-Dor, A., et al.: Tissue classification with gene expression profiles. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)
Article Google Scholar
Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 98–109 (2000)
Google Scholar
Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Article MATH Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machine. Machine Learning 46(1-3), 389–422 (2002)
Article MATH Google Scholar
Hall, M.: Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Dept Computer Science, Hamilton, New Zealand (1999)
Google Scholar
Hellem, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), 0017.1–0017.11 (2002)
Google Scholar
Inza, I., et al.: Filter versus wrapper gene selection approaches in dna microarray domains. Artificial Intelligence in Medicine 31, 91–103 (2004)
Article Google Scholar
Kononenko, I.: Estimating attributes: Analysis and estensions of relief. In: European Conf. on Machine Learning, Vienna, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar
Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 7th IEEE Int. Conf. on Tools with Artificial Intelligence (1995)
Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Eng. 17(3), 1–12 (2005)
Article MATH Google Scholar
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Ruiz, R., Riquelme, J., Aguilar-Ruiz, J.: Projection-based measure for efficient feature selection. Journal of Intelligent and Fuzzy System 12(3–4), 175–183 (2002)
MATH Google Scholar
Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2005)
Google Scholar
Xing, E., Jordan, M., Karp, R.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th Int. Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)
Google Scholar
Xiong, M., Jin, L., Li, W., Boerwinkle, E.: Computatinal methods for gene expression-based tumor classification. BioTechniques 29, 1264–1270 (2000)
Google Scholar
Yu, L., Liu, H.: Redundancy based feature selection for microarry data. In: 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Seville, Avenida Reina Mercedes s/n, 41012, Sevilla, Spain
Roberto Ruiz & Beatriz Pontes
Area of Computer Science, University of Pablo de Olavide, Ctra. de Utrera, km. 1, 41013, Sevilla, Spain
Raúl Giráldez & Jesús S. Aguilar–Ruiz

Authors

Roberto Ruiz
Beatriz Pontes
Raúl Giráldez
Jesús S. Aguilar–Ruiz

Editor information

Editors and Affiliations

School of Design, Engineering and Computing, Bournemouth University, UK
Bogdan Gabrys
Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, SA, 5095, Mawson Lakes, Australia
Lakhmi C. Jain

Rights and permissions

Copyright information

About this paper

Cite this paper

Ruiz, R., Pontes, B., Giráldez, R., Aguilar–Ruiz, J.S. (2006). Gene Ranking from Microarray Data for Cancer Classification–A Machine Learning Approach. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893004\_161

Download citation

.RIS
.ENW
.BIB
DOI: https://doi.org/10.1007/11893004\_161
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46537-9
Online ISBN: 978-3-540-46539-3
eBook Packages: Computer Science Computer Science (R0)Springer Nature Proceedings Computer Science

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.