Gene Ranking from Microarray Data for Cancer Classification–A Machine Learning Approach (original) (raw)

Abstract

Traditional gene selection methods often select the top–ranked genes according to their individual discriminative power. We propose to apply feature evaluation measure broadly used in the machine learning field and not so popular in the DNA microarray field. Besides, the application of sequential gene subset selection approaches is included. In our study, we propose some well-known criteria (filters and wrappers) to rank attributes, and a greedy search procedure combined with three subset evaluation measures. Two completely different machine learning classifiers are applied to perform the class prediction. The comparison is performed on two well–known DNA microarray data sets. We notice that most of the top-ranked genes appear in the list of relevant–informative genes detected by previous studies over these data sets.

This research was supported by the Spanish Research Agency CICYT under grants TIN2004–00159 and TIN2004-06689C0303.

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Alon, U., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96, 6745–6750 (1999)
    Article Google Scholar
  2. Ben-Dor, A., et al.: Tissue classification with gene expression profiles. Proc. Natl. Acad. Sci. USA 98(26), 15149–15154 (2001)
    Article Google Scholar
  3. Dash, M., Liu, H., Motoda, H.: Consistency based feature selection. In: Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 98–109 (2000)
    Google Scholar
  4. Golub, T., et al.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
    Article Google Scholar
  5. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
    Article MATH Google Scholar
  6. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machine. Machine Learning 46(1-3), 389–422 (2002)
    Article MATH Google Scholar
  7. Hall, M.: Correlation-based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Dept Computer Science, Hamilton, New Zealand (1999)
    Google Scholar
  8. Hellem, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), 0017.1–0017.11 (2002)
    Google Scholar
  9. Inza, I., et al.: Filter versus wrapper gene selection approaches in dna microarray domains. Artificial Intelligence in Medicine 31, 91–103 (2004)
    Article Google Scholar
  10. Kononenko, I.: Estimating attributes: Analysis and estensions of relief. In: European Conf. on Machine Learning, Vienna, pp. 171–182. Springer, Heidelberg (1994)
    Google Scholar
  11. Liu, H., Setiono, R.: Chi2: Feature selection and discretization of numeric attributes. In: 7th IEEE Int. Conf. on Tools with Artificial Intelligence (1995)
    Google Scholar
  12. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. on Knowledge and Data Eng. 17(3), 1–12 (2005)
    Article MATH Google Scholar
  13. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Francisco (1993)
    Google Scholar
  14. Ruiz, R., Riquelme, J., Aguilar-Ruiz, J.: Projection-based measure for efficient feature selection. Journal of Intelligent and Fuzzy System 12(3–4), 175–183 (2002)
    MATH Google Scholar
  15. Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2005)
    Google Scholar
  16. Xing, E., Jordan, M., Karp, R.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th Int. Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)
    Google Scholar
  17. Xiong, M., Jin, L., Li, W., Boerwinkle, E.: Computatinal methods for gene expression-based tumor classification. BioTechniques 29, 1264–1270 (2000)
    Google Scholar
  18. Yu, L., Liu, H.: Redundancy based feature selection for microarry data. In: 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2004)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science, University of Seville, Avenida Reina Mercedes s/n, 41012, Sevilla, Spain
    Roberto Ruiz & Beatriz Pontes
  2. Area of Computer Science, University of Pablo de Olavide, Ctra. de Utrera, km. 1, 41013, Sevilla, Spain
    Raúl Giráldez & Jesús S. Aguilar–Ruiz

Authors

  1. Roberto Ruiz
  2. Beatriz Pontes
  3. Raúl Giráldez
  4. Jesús S. Aguilar–Ruiz

Editor information

Editors and Affiliations

  1. School of Design, Engineering and Computing, Bournemouth University, UK
    Bogdan Gabrys
  2. Centre for SMART Systems, School of Environment and Technology, University of Brighton, BN2 4GJ, Brighton, UK
    Robert J. Howlett
  3. School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, SA, 5095, Mawson Lakes, Australia
    Lakhmi C. Jain

Rights and permissions

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ruiz, R., Pontes, B., Giráldez, R., Aguilar–Ruiz, J.S. (2006). Gene Ranking from Microarray Data for Cancer Classification–A Machine Learning Approach. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893004\_161

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us