Filter versus wrapper gene selection approaches in DNA microarray domains - PubMed (original) (raw)
Review
Filter versus wrapper gene selection approaches in DNA microarray domains
Iñaki Inza et al. Artif Intell Med. 2004 Jun.
Abstract
DNA microarray experiments generating thousands of gene expression measurements, are used to collect information from tissue and cell samples regarding gene expression differences that could be useful for diagnosis disease, distinction of the specific tumor type, etc. One important application of gene expression microarray data is the classification of samples into known categories. As DNA microarray technology measures the gene expression en masse, this has resulted in data with the number of features (genes) far exceeding the number of samples. As the predictive accuracy of supervised classifiers that try to discriminate between the classes of the problem decays with the existence of irrelevant and redundant features, the necessity of a dimensionality reduction process is essential. We propose the application of a gene selection process, which also enables the biology researcher to focus on promising gene candidates that actively contribute to classification in these large scale microarrays. Two basic approaches for feature selection appear in machine learning and pattern recognition literature: the filter and wrapper techniques. Filter procedures are used in most of the works in the area of DNA microarrays. In this work, a comparison between a group of different filter metrics and a wrapper sequential search procedure is carried out. The comparison is performed in two well-known DNA microarray datasets by the use of four classic supervised classifiers. The study is carried out over the original-continuous and three-intervals discretized gene expression data. While two well-known filter metrics are proposed for continuous data, four classic filter measures are used over discretized data. The same wrapper approach is used for both continuous and discretized data. The application of filter and wrapper gene selection procedures leads to considerably better accuracy results in comparison to the non-gene selection approach, coupled with interesting and notable dimensionality reductions. Although the wrapper approach mainly shows a more accurate behavior than filter metrics, this improvement is coupled with considerable computer-load necessities. We note that most of the genes selected by proposed filter and wrapper procedures in discrete and continuous microarray data appear in the lists of relevant-informative genes detected by previous studies over these datasets. The aim of this work is to make contributions in the field of the gene selection task in DNA microarray datasets. By an extensive comparison with more popular filter techniques, we would like to make contributions in the expansion and study of the wrapper approach in this type of domains.
Similar articles
- A novel feature selection approach for biomedical data classification.
Peng Y, Wu Z, Jiang J. Peng Y, et al. J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30. J Biomed Inform. 2010. PMID: 19647098 - Wrapper filtering criteria via linear neuron and kernel approaches.
Blazadonakis ME, Zervakis M. Blazadonakis ME, et al. Comput Biol Med. 2008 Aug;38(8):894-912. doi: 10.1016/j.compbiomed.2008.05.005. Epub 2008 Jul 24. Comput Biol Med. 2008. PMID: 18656182 - Gene selection from microarray data for cancer classification--a machine learning approach.
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW. Wang Y, et al. Comput Biol Chem. 2005 Feb;29(1):37-46. doi: 10.1016/j.compbiolchem.2004.11.001. Comput Biol Chem. 2005. PMID: 15680584 - Statistical framework for gene expression data analysis.
Modlich O, Munnes M. Modlich O, et al. Methods Mol Biol. 2007;377:111-30. doi: 10.1007/978-1-59745-390-5_6. Methods Mol Biol. 2007. PMID: 17634612 Review. - [Gene-expression analysis using DNA microarrays].
Braam GB, Bluyssen HA, Voest EE, Koomans HA. Braam GB, et al. Ned Tijdschr Geneeskd. 2002 Oct 5;146(40):1867-73. Ned Tijdschr Geneeskd. 2002. PMID: 12395593 Review. Dutch.
Cited by
- Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data.
Sakellariou A, Sanoudou D, Spyrou G. Sakellariou A, et al. BMC Bioinformatics. 2012 Oct 17;13:270. doi: 10.1186/1471-2105-13-270. BMC Bioinformatics. 2012. PMID: 23075381 Free PMC article. - A comparative study of improvements Pre-filter methods bring on feature selection using microarray data.
Wang Y, Fan X, Cai Y. Wang Y, et al. Health Inf Sci Syst. 2014 Oct 16;2:7. doi: 10.1186/2047-2501-2-7. eCollection 2014. Health Inf Sci Syst. 2014. PMID: 25825671 Free PMC article. - Histogram-Based Features Selection and Volume of Interest Ranking for Brain PET Image Classification.
Garali I, Adel M, Bourennane S, Guedj E. Garali I, et al. IEEE J Transl Eng Health Med. 2018 Mar 16;6:2100212. doi: 10.1109/JTEHM.2018.2796600. eCollection 2018. IEEE J Transl Eng Health Med. 2018. PMID: 29637029 Free PMC article. - On the use of QDE-SVM for gene feature selection and cell type classification from scRNA-seq data.
Ng GYL, Tan SC, Ong CS. Ng GYL, et al. PLoS One. 2023 Oct 19;18(10):e0292961. doi: 10.1371/journal.pone.0292961. eCollection 2023. PLoS One. 2023. PMID: 37856458 Free PMC article. - Fluorescence spectral shape analysis for nucleotide identification.
Huang Y, Li Z, Risinger AL, Enslow BT, Zeman CJ 4th, Gong J, Yang Y, Schanze KS. Huang Y, et al. Proc Natl Acad Sci U S A. 2019 Jul 30;116(31):15386-15391. doi: 10.1073/pnas.1820713116. Epub 2019 Jul 15. Proc Natl Acad Sci U S A. 2019. PMID: 31308243 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials