A hybrid approach for gene selection and classification using support vector machine (original) (raw)
Related papers
International Journal of Computer Science & Engineering Survey, 2011
The DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. But compared to the number of genes involved, available training data sets generally have a fairly small sample size for classification. These training data limitations constitute a challenge to certain classification methodologies. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the un wanted noisy and redundant genes This paper presents a review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification.
In gene expression dataset, classification is the task of involving high dimensionality and risk since large number of features is irrelevant and redundant. The classification requires feature selection method and a classification; hence this paper proposed a method of choosing suitable combination of attribute selection and classifying algorithms for good accuracy in addition for computational efficiency, generalization performance and feature interpretability. In this paper, the comparative study had done by some well known feature selection methods such as FCBF, ReliefF,
Gene Selection for Cancer Classification using Support Vector Machines
Machine Learning, 2002
DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues.
Gene selection from microarray data for cancer classification—a machine learning approach
Computational Biology and Chemistry, 2005
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naïve Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.
2017 10th International Conference on Electrical and Electronics Engineering (ELECO), 2017
In this study, three different feature selection algorithms are compared using Support Vector Machines as classifier for cancer classification through gene expression data. The ability of feature selection algorithms to select an optimal gene subset for a cancer type is evaluated by the classification ability of selected genes. A publicly available micro array dataset is employed for gene expression values. Selected gene subsets were able to classify subtypes of the considered cancer type with high accuracies and showed that these feature selection methods were applicable for bio-marker gene selection.
Constantly improving gene expression technology offer the ability to measure the expression levels of thousand of genes in parallel. Gene expression data is expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. Key issue that needs to be addressed is the selection of small number of genes that contribute to a disease from the thousands of genes measured on microarrays that are inherently noisy. This work deals with finding the small subset of informative genes from gene expression microarray data which maximize the classification accuracy. This paper introduces a new algorithm of hybrid Genetic Algorithm and Support Vector Machine for genes selection and classification task. We show that the classification accuracy of the proposed algorithm is superior to a number of current state-of-the-art methods of two widely used benchmark datasets. The informative genes from the best subset are validated and verified by comparing them with the biological results produced from biologist and computer scientist researches in order to explore the biological plausibility.
Hybrid Feature Selection and Ensemble Learning Methods for Gene Selection and Cancer Classification
International Journal of Advanced Computer Science and Applications, 2021
A promising research field in bioinformatics and data mining is the classification of cancer based on gene expression results. Efficient sample classification is not supported by all genes. Thus, to identify the appropriate genes that help efficiently distinguish samples, a robust feature selection method is needed. Redundancy in the data on gene expression contributes to low classification performance. This paper presents the combination for gene selection and classification methods using ranking and wrapper methods. In ranking methods, information gain was used to reduce the size of dimensionality to 1% and 5%. Then, in wrapper methods K-nearest neighbors and Naïve Bayes were used with Best First, Greedy Stepwise, and Rank Search. Several combinations were investigated because it is known that no single model can give the best results using different datasets for all circumstances. Therefore, combining multiple feature selection methods and applying different classification models...
−Microarray Data, often characterised by high-dimensions and small samples, is used for cancer classification problems that classify the given (tissue) samples as deceased or healthy on the basis of analysis of gene expression profile. The goal of feature selection is to search the most relevant features from thousands of related features of a particular problem domain. The focus of this study is a method that relaxes the maximum accuracy criterion for feature selection and selects the combination of feature selection method and classifier that using small subset of features obtains accuracy not statistically indicatively different than the maximum accuracy. By selecting the classifier employing small number of features along with a good accuracy, the risk of over fitting (bias) is reduced. This has been corroborated empirically using some common attribute selection methods (ReliefF, SVM-RFE, FCBF, and Gain Ratio) and classifiers (3 Nearest Neighbour, Naive Bayes and SVM) applied to 6 different microarray cancer data sets. We use hypothesis testing to compare several configurations and select particular configurations that perform well with small genes on these data sets.
Gene Selection for Cancer Classification through Ensemble of Methods
Lecture Notes in Computer Science, 2009
The paper develops the methods of selection of the most important gene sequence on the basis of the gene expression microarray, corresponding to different types of cancer. Special two stage strategy of selection has been proposed. In the first stage we apply few different methods of assessment of the importance of genes. Each method stresses different aspects of the problem. In the second stage the selected genes are compared and the genes chosen most frequently by all methods of selection are treated as the most important and representative for the particular type of problem. The results of selection are analyzed using PCA and the selected genes form the input to the SVM classifier recognizing the classes of cancer. The numerical experiments confirm the efficiency of the proposed approach.
A novel gene selection algorithm for cancer classification using microarray datasets
BMC Medical Genomics, 2019
Background: Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results. Methods: An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators.. Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP. Results: Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods. Conclusion: Gene subset selected by GSP can achieve a higher classification accuracy with less processing time.