Hybridizing the Dimensionality Reduction Approaches for Cancer Classification Using Genes Expression Analysis (original) (raw)
Related papers
In gene expression dataset, classification is the task of involving high dimensionality and risk since large number of features is irrelevant and redundant. The classification requires feature selection method and a classification; hence this paper proposed a method of choosing suitable combination of attribute selection and classifying algorithms for good accuracy in addition for computational efficiency, generalization performance and feature interpretability. In this paper, the comparative study had done by some well known feature selection methods such as FCBF, ReliefF,
International Journal of Computer Science & Engineering Survey, 2011
The DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. But compared to the number of genes involved, available training data sets generally have a fairly small sample size for classification. These training data limitations constitute a challenge to certain classification methodologies. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the un wanted noisy and redundant genes This paper presents a review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification.
Gene Selection from Gene Expression Data using Genetic Algorithm for Cancer Classification
2006
Abstract Constantly improving gene expression technology offer the ability to measure the expression levels of thousand of genes in parallel. Gene expression data is expected to significantly aid in the development of efficient cancer diagnosis. Key issue that needs to be addressed is a selection of small number of genes that contribute to a disease from the thousands of genes measured on microarrays that are inherently noisy.
Efficient Machine Learning Technique for Tumor Classification Based on Gene Expression Data
In bioinformatics research, cancer classification is a crucial domain. The use of microarray technology to identify specific illnesses is common. A small number of genes uncovered in clinical applications can lead to low-cost medicines that can help estimate a patient's survival time or diagnose cancer. Because there are more genes and fewer samples in microarray data, high dimensionality is a serious concern. The genes in the microarray data were evaluated using F-statistics,T-Statistics, and Signal-to-Noise Ratio (SNR) in this study.The top-m rated genes are analyzed using optimization approaches to retrieve useful information. The genetic algorithm (GA), particle swarm optimization (PSO), cuckoo search (CS), and shuffling frog leaping with rapid flying are among the methods employed (SFLLF). Classification is done using the Support vector machine (SVM), the K-Nearest Neighbor classifier (KNN), and the Naive Bayes classifier (NBC). Lung Cancer Michigan, AML-ALL, Colon Tumour, Lung Harvard2, and others are among the datasets utilized for experimental analysis. The classifiers are assessed using a 5-fold cross-validation approach. The findings demonstrate that the suggested two-step feature selection approaches are effective in selecting relevant genes from microarray data for cancer classification.
Gene expression data classification using genetic algorithm-based feature selection
TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES, 2021
In this study, hybrid methods are proposed for feature selection and classification of gene expression datasets. In the proposed genetic algorithm/support vector machine (GA-SVM) and genetic algorithm/k nearest neighbor (GA-KNN) hybrid methods, genetic algorithm is improved using Pearson's correlation coefficient, Relief-F, or mutual information. Crossover and selection operations of the genetic algorithm are specialized. Eight different gene expression datasets are used for classification process. The classification performances of the proposed methods are compared with the traditional GA-KNN and GA-SVM wrapper methods and other studies in the literature. Classification results demonstrate that higher accuracy rates are obtained with the proposed methods compared to the other methods for all datasets.
A novel approach to combining clustering and feature selection is presented. Feature selection for clustering is a problem rarely addressed in the literature. Although recently there has been some work on the area, there is a lack of extensive empirical evaluation to assess the potential of each method. It implements a wrapper strategy for feature selection, in the sense that the features are directly selected by optimizing the discriminative power of the used partitioning algorithm. Experiments with real-world datasets demonstrate that our method is able to infer both meaningful partitions and meaningful subsets of features. In this paper, we present a comparative study on four feature selection heuristics by applying them to two sets of data. The first set of data is gene expression profiles from colon biopsy samples and the second set of data are gene expression profiles from ALL/AML data set. Based on features chosen by these methods, error rates of several clustering algorithms were obtained for analysis. Results confirm the utility of feature selection for clustering.
Constantly improving gene expression technology offer the ability to measure the expression levels of thousand of genes in parallel. Gene expression data is expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. Key issue that needs to be addressed is the selection of small number of genes that contribute to a disease from the thousands of genes measured on microarrays that are inherently noisy. This work deals with finding the small subset of informative genes from gene expression microarray data which maximize the classification accuracy. This paper introduces a new algorithm of hybrid Genetic Algorithm and Support Vector Machine for genes selection and classification task. We show that the classification accuracy of the proposed algorithm is superior to a number of current state-of-the-art methods of two widely used benchmark datasets. The informative genes from the best subset are validated and verified by comparing them with the biological results produced from biologist and computer scientist researches in order to explore the biological plausibility.
Classification of human cancer diseases by gene expression profiles
Applied Soft Computing, 2017
DNA microarrays appearances empowered the simultaneous observing of expression levels of a large number of genes In the proposed methodology, Information gain (IG) is first used for feature selection, then Genetic Algorithm (GA) is employed for feature reduction and finally Genetic Programming (GP) is used for cancer types' classification.
Performance Study of Cancer Selection/Classification Algorithms Based on Microarray Data
Microarray data has an important role in detecting and classifying all types of cancer tissues. In cancer researches, relatively low number of samples in microarray has always caused some problems in designing classifiers. So, microarray data is preprocessed through gene selection techniques and the genes which contain no information is discarded. Basically, a proper gene selection method can effectively improve the efficiency of diseases (cancers) classification. The purpose of this article is to compare different extraction algorithms of informative genes and also their different classification algorithms. First, ReliefF algorithms, information gain and normalized mutual information are introduced as algorithms used in order to extract feature and their features are noted. Then three classification algorithms, two proposed Bayesian Linear Discriminate Analysis (BLDA), Modified Support Vector Machine (υ-support Vector Machine) algorithms and Probabilistic Neural Network are compared in terms of classification accuracy. Implementation results show that combinational algorithm of normalized mutual information and BLDA classifier has best performance among other raised methods. So that, with applying this algorithm, classification accuracy in blood cancer data base is 95.34 percent.
A Study on Computational Process in Gene Expression Data
This context is commenced to examine the various methods and its challenges in Disease Identification of Gene Expression Data.The elementalresponsibility of these techniques is classification and categorization of gene expression, analysis of the expression, Pattern Recognition, and Identification. This provides an inclusive survey of Micro Array Data analysis techniques and intends a processing component for disease identification. For thehealthcare provider, it is essential to maintain the quality of data because this data is useful to provide cost effective healthcare treatments to the patients. Health Care Administration retains the Microarray data which is refined by expertise and is analyzed by the expertise to identify the disease. This process of analyzing this Microarray data as manual is complicated in identification and classification; due to this Microarray data some difficulties such as missing information, empty values, and incorrect entries. Exclusive of quality information there is no valuableconsequences. For successful data mining, animpediment in health data is individual the majordifficulty for examining medical data. So, it is essential to maintain the quality and accuracy data for data mining to making aneffective decision. The major goal of this survey is focused on various techniques of data mining for developing a prediction model for disease susceptibility using Gene Expression Data.The microarray data is pre-processed to analyze the gene expression to classify the over-expression and under-expression data. Then the classified gene data is then clustered and the best feature selection is applied to discover a pattern. Finally, the association mining handled under the organized set of the gene expression data to theidentification of the disease. This context provides efficient techniques to overcome the manual identification of diseases. 1. Introduction DNA microarrays propose the capability to appear at the expression of thousands of genes in a particularresearch one of the significantrelevance of microarray knowledge is disease identification and classification. Throughmicroarraytechnology, researchers will be proficient in organizing special diseases according todissimilar expression intensity incommon anddevelopment cells, to determine the affiliationamong genes, to recognize the critical genes in the development of disease [1]. The main task of microarray classification is to construct a classifier from chronological microarray gene expression data, and then it utilizes the classifier to categorizeprospectapproachingdata. Appropriate to the rapid improvementofDNA microarray knowledge, gene rangetechniques andorganizationtechniques are being figured for enhanceduse of classification algorithm in microarray gene expression data.The study of outsized gene expression data sets is fetching a dispute in disease classification [2]. Thusgene selection is one of the significantcharacteristics. Proficient gene selection can considerablysimplicity computational burden of the consequent classification assignment and can yield a much smaller and more condensed gene set,not including thedefeat of classification. In classifying microarray data, the key objective of gene selection is to explore for the genes, which remain the greatest amount of information about the set and decrease the categorization error. [3] Data mining techniques classically descend into either supervised or unsupervised classes.Microarray technologies afford a dominant tool by which the expression prototypes of thousands of genes can be examinedconcurrently whose relevancecollection from disease diagnosis to treatment response. Gene expression is the renovation of the DNA progression into mRNA progression by dictation then transformed into amino acid sequences called proteins. The key challenge in classifying gene expression data is the annoyance of dimensionality difficulty. There is ahugeamount of genes (features) evaluated to small sample sizes [3]. To conquer this, feature selection is worn to recognize differentially articulated genes and to eliminateinappropriate genes. Gene selection remains asa significant task to extend the exactness and speed ofclassification structures.In general, feature selection can be prepared into three kinds: Filter, Wrapper, and Embedded methods. They are classified based on how afeature selection methodmerges with the production of aclassification form. Anextensivequantity of literature has been available on gene selection techniques for constructionvaluable classification model. In this paper, we