Network-based Prediction of Cancer under Genetic Storm (original) (raw)
Related papers
A Classification Framework Applied to Cancer Gene Expression Profiles
Journal of Healthcare Engineering, 2013
Classification of cancer based on gene expression has provided insight into possible treatment strategies. Thus, developing machine learning methods that can successfully distinguish among cancer subtypes or normal versus cancer samples is important. This work discusses supervised learning techniques that have been employed to classify cancers. Furthermore, a two-step feature selection method based on an attribute estimation method (e.g., ReliefF) and a genetic algorithm was employed to find a set of genes that can best differentiate between cancer subtypes or normal versus cancer samples. The application of different classification methods (e.g., decision tree, k-nearest neighbor, support vector machine (SVM), bagging, and random forest) on 5 cancer datasets shows that no classification method universally outperforms all the others. However, k-nearest neighbor and linear SVM generally improve the classification performance over other classifiers. Finally, incorporating diverse types of genomic data (e.g., protein-protein interaction data and gene expression) increase the prediction accuracy as compared to using gene expression alone.
A Study of Network-based Approach for Cancer Classification
The advent of high-throughput techniques such as microarray data enabled researchers to elucidate process in a cell that fruitfully useful for pathological and medical. For such opportunities, microarray gene expression data have been explored and applied for various types of studies e.g. gene association, gene classification and construction of gene network. Unfortunately, since gene expression data naturally have a few of samples and thousands of genes, this leads to a biological and technical problems. Thus, the availability of artificial intelligence techniques couples with statistical methods can give promising results for addressing the problems. These approaches derive two well known methods: supervised and unsupervised. Whenever possible, these two superior methods can work well in classification and clustering in term of class discovery and class prediction. Significantly, in this paper we will review the benefit of network-based in term of interaction data for classification in identification of class cancer.
Gene selection from microarray data for cancer classification—a machine learning approach
Computational Biology and Chemistry, 2005
A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naïve Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.
2017 10th International Conference on Electrical and Electronics Engineering (ELECO), 2017
In this study, three different feature selection algorithms are compared using Support Vector Machines as classifier for cancer classification through gene expression data. The ability of feature selection algorithms to select an optimal gene subset for a cancer type is evaluated by the classification ability of selected genes. A publicly available micro array dataset is employed for gene expression values. Selected gene subsets were able to classify subtypes of the considered cancer type with high accuracies and showed that these feature selection methods were applicable for bio-marker gene selection.
Hybrid Correlation based Gene Selection for Accurate Cancer Classification of Gene Expression Data
International Journal of Computer Applications, 2012
Microarray data has been widely applied to cancer classification, where the purpose is to classify and predict the category of a sample by its gene expression profile. DNA microarray is a gene chip which consists of expression levels for a huge number of genes on a relatively small number of samples. However, only a small number of genes contribute in accurate classification of cancer. Therefore, the challenging task is to identify a small subset of informative genes which has maximum amount of information about the class. Moreover, it also minimizes the classification errors. In this paper, we propose a hybrid negative correlated method, which combines the features from various correlation based feature selection techniques, for the generation of mutually exclusive informative feature sets. We test the effectiveness of the proposed approach using a neural network based classifier on two benchmark gene expression data sets-colon dataset and leukemia dataset. The obtained results are encouraging as hybrid negative correlated method based features give better recognition accuracy than positive correlated and other negative correlated features.
Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review
Bioengineering
Cancer is a term that denotes a group of diseases caused by the abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic acid (DNA) microarrays and ribonucleic acid (RNA)-sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the application of deep learning models due to their comparative advantages for identifying gene patterns that a...
In gene expression dataset, classification is the task of involving high dimensionality and risk since large number of features is irrelevant and redundant. The classification requires feature selection method and a classification; hence this paper proposed a method of choosing suitable combination of attribute selection and classifying algorithms for good accuracy in addition for computational efficiency, generalization performance and feature interpretability. In this paper, the comparative study had done by some well known feature selection methods such as FCBF, ReliefF,
An Efficient Cancer Classification Model Using Microarray and High-Dimensional Data
Computational Intelligence and Neuroscience
Cancer can be considered as one of the leading causes of death widely. One of the most effective tools to be able to handle cancer diagnosis, prognosis, and treatment is by using expression profiling technique which is based on microarray gene. For each data point (sample), gene data expression usually receives tens of thousands of genes. As a result, this data is large-scale, high-dimensional, and highly redundant. The classification of gene expression profiles is considered to be a (NP)-Hard problem. Feature (gene) selection is one of the most effective methods to handle this problem. A hybrid cancer classification approach is presented in this paper, and several machine learning techniques were used in the hybrid model: Pearson’s correlation coefficient as a correlation-based feature selector and reducer, a Decision Tree classifier that is easy to interpret and does not require a parameter, and Grid Search CV (cross-validation) to optimize the maximum depth hyperparameter. Seven ...
Studies in Informatics and Control
Gene Selection from gene expression data for Cancer prediction has been an area of intensive research, aiming at identifying the minimal and optimal set of candidate genes that could generate accurate predictive performance. The two major problems encountered in this process are the high dimensionality of data with comparatively few instances and the need to categorize records under multiple classes. In this paper we propose a novel approach called Rank-Weight Feature Selection that utilizes the filtering capacity of more than one feature selection algorithm to detect the minimal set of predictive genes that generate higher predictor performance in categorizing and predicting diverse oncogenic gene expression data. The filtered features (genes) are weighted based on the number of feature relevance algorithms reporting them to be significant. The ranked genes are then used to validate the proposed method by utilizing ten classifiers over five diverse gene expression datasets. The results proved that the proposed approach generated higher predictive performance with fewer features than previously reported results with the most relevant and minimal set of genes and commend classifiers based on their accuracy and reliability in predicting cancer data.
International Journal of Computer Science & Engineering Survey, 2011
The DNA microarray technology has modernized the approach of biology research in such a way that scientists can now measure the expression levels of thousands of genes simultaneously in a single experiment. Gene expression profiles, which represent the state of a cell at a molecular level, have great potential as a medical diagnosis tool. But compared to the number of genes involved, available training data sets generally have a fairly small sample size for classification. These training data limitations constitute a challenge to certain classification methodologies. Feature selection techniques can be used to extract the marker genes which influence the classification accuracy effectively by eliminating the un wanted noisy and redundant genes This paper presents a review of feature selection techniques that have been employed in micro array data based cancer classification and also the predominant role of SVM for cancer classification.