Identification of Differentially Expressed Genes Using Deep Learning in Bioinformatics (original) (raw)

Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification

Applied Sciences, 2022

Ribonucleic acid Sequencing (RNA-Seq) analysis is particularly useful for obtaining insights into differentially expressed genes. However, it is challenging because of its high-dimensional data. Such analysis is a tool with which to find underlying patterns in data, e.g., for cancer specific biomarkers. In the past, analyses were performed on RNA-Seq data pertaining to the same cancer class as positive and negative samples, i.e., without samples of other cancer types. To perform multiple cancer type classification and to find differentially expressed genes, data for multiple cancer types need to be analyzed. Several repositories offer RNA-Seq data for various cancer types. In this paper, data from the Mendeley data repository for five cancer types are analyzed. As a first step, RNA-Seq values are converted to 2D images using normalization and zero padding. In the next step, relevant features are extracted and selected using Deep Learning (DL). In the last phase, classification is pe...

DEEP LEARNING-BASED CANCER CLASSIFICATION FOR MICROARRAY DATA: A SYSTEMATIC REVIEW

Journal of Theoretical and Applied Information Technology, 2021

Deep neural networks are robust techniques and recently used extensively for building cancer classification models from different types of data. Nowadays, microarray gene expression datasets consider an essential source of data that is used in cancer classifications. However, due to the small size of samples compared to the high dimensionality of microarray data, many machine learning techniques have failed to distinguish the most relevant and informatics genes. Therefore, deep learning is demand due to its ability to automatically discovering the complex relationship between features with significant accuracy and high performance. The current study aims to reveal the state-of-the-art of deep neural network architectures and how it can utilize from microarray data. Therefore, several deep neural network architectures were built such as CNN, DNN, RNN, DBN, DBM and DAE to be compatible with the different learning processes (supervised, unsupervised and semi-supervised). As a result, CNN considers the most common neural network architecture used in the medical field due to its robustness and high performance in cancer classification. Results indicate that choosing suitable architecture of the deep neural network and its hyperparameters is one of the most difficulties facing the researcher in designing models for cancer prediction and classification because there is no particular rule to ensure high prediction accuracy.

Artificial Intelligence Technique for Gene Expression by Tumor RNA-Seq Data: A Novel Optimized Deep Learning Approach

IEEE Access, 2020

Cancer is one of the most feared and aggressive diseases in the world and is responsible for more than 9 million deaths universally. Staging cancer early increases the chances of recovery. One staging technique is RNA sequence analysis. Recent advances in the efficiency and accuracy of artificial intelligence techniques and optimization algorithms have facilitated the analysis of human genomics. This paper introduces a novel optimized deep learning approach based on binary particle swarm optimization with decision tree (BPSO-DT) and convolutional neural network (CNN) to classify different types of cancer based on tumor RNA sequence (RNA-Seq) gene expression data. The cancer types that will be investigated in this research are kidney renal clear cell carcinoma (KIRC), breast invasive carcinoma (BRCA), lung squamous cell carcinoma (LUSC), lung adenocarcinoma (LUAD) and uterine corpus endometrial carcinoma (UCEC). The proposed approach consists of three phases. The first phase is preprocessing, which at first optimize the high-dimensional RNA-seq to select only optimal features using BPSO-DT and then, converts the optimized RNA-Seq to 2D images. The second phase is augmentation, which increases the original dataset of 2086 samples to be 5 times larger. The selection of the augmentations techniques was based achieving the least impact on manipulating the features of the images. This phase helps to overcome the overfitting problem and trains the model to achieve better accuracy. The third phase is deep CNN architecture. In this phase, an architecture of two main convolutional layers for featured extraction and two fully connected layers is introduced to classify the 5 different types of cancer according to the availability of images on the dataset. The results and the performance metrics such as recall, precision and F1 score show that the proposed approach achieved an overall testing accuracy of 96.90%. The comparative results are introduced, and the proposed method outperforms those in related works in terms of testing accuracy for 5 classes of cancer. Moreover, the proposed approach is less complex and consume less memory. INDEX TERMS Cancer, RNA sequence, deep convolutional neural network, gene expression data.

A review on current and future prospective of Cancer Classification through Deep Learning

Research & Reviews in Biotechnology & Biosciences, 2019

Cancer is the second foremost origin of death in the world, next to heart disease. The name cancer refers to more than a thousand sicknesses illustrate by out of direct development & replication of multiple cells. Due to this reason of cancer analysis, utilization of microarray datasets along with machine learning methods escalating in the current research scenario. Classification is one of the very broadly used datamining techniques to build a model that describes & distinguishes data classes in a manner to be used to predict the class of unseen instances. In machine learning, features are chosen manually for a classifier. With Deep learning features, extraction and modelling steps are automatic. Deep learning is one of the most significant among machine learning that requires computing system to iteratively perform calculations to identified patterns by itself. Deep learning use training data to discover underlying patterns, build models & make predictions based on the best fit model. In the last decades, there has been a growing interest of addressing cancer classification using deep learning due to their positive revival of neural networks and connectionism from the genuine integration of the latest advances in parallel processing enabled by coprocessors. Here the review of deep learning for classification in bioinformatics presenting examples of current research. Additionally, we discuss Deep learning and convolutional neural network working principles to provide a useful and comprehensive perspective, this paper presents three works DeepGen, SDAE, Enhance Feature learning in a brief description of each study. We believe that this review will provide valuable insights and serve as a starting point for the researcher to apply deep learning approaches for classification in Gene expression dataset.

A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data

PeerJ Computer Science

Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors des...

Deep gene selection method to select genes from microarray datasets for cancer classification

BMC Bioinformatics, 2019

Background Microarray datasets consist of complex and high-dimensional samples and genes, and generally the number of samples is much smaller than the number of genes. Due to this data imbalance, gene selection is a demanding task for microarray expression data analysis. Results The gene set selected by DGS has shown its superior performances in cancer classification. DGS has a high capability of reducing the number of genes in the original microarray datasets. The experimental comparisons with other representative and state-of-the-art gene selection methods also showed that DGS achieved the best performance in terms of the number of selected genes, classification accuracy, and computational cost. Conclusions We provide an efficient gene selection algorithm can select relevant genes which are significantly sensitive to the samples’ classes. With the few discriminative genes and less cost time by the proposed algorithm achieved much high prediction accuracy on several public microarr...

Convolutional Neural Network Approach to Predict Tumor Samples Using Gene Expression Data

2021

Cancer is threatening millions of people each year and its early diagnosis is still a challenging task. Early diagnosis is one of the major ways to tackle the disease and lower the mortality rate. Advancements in deep learning approaches and the availability of biological data offer applications that can facilitate the diagnosis and characterization of cancer. Here, we aimed to provide a new perspective of cancer diagnosis using a deep learning approach on gene expression data. In this study, RNA-Seq data of approximately 30 different types of cancer patients the Cancer Genome Atlas (TCGA) study, and normal tissue RNA-Seq data from GTEx were used. The input data for the training was transformed to RGB format and the training was carried out with a Convolutional Neural Network (CNN). The trained algorithm is able to predict cancer with 97% accuracy, using gene expression data. In conclusion, our study shows that the deep learning approach and biological data have a huge potential in the diagnosis and identification of tumor samples.

Deep Learning Based Tumor Type Classification Using Gene Expression Data

Differential analysis occupies the most significant portion of the standard practices of RNA-Seq analysis. However, the conventional method is matching the tumor samples to the normal samples, which are both from the same tumor type. The output using such method would fail in differentiating tumor types because it lacks the knowledge from other tumor types. Pan-Cancer Atlas provides us with abundant information on 33 prevalent tumor types which could be used as prior knowledge to generate tumor-specific biomarkers. In this paper, we embedded the high dimensional RNA-Seq data into 2-D images and used a convolutional neural network to make classification of the 33 tumor types. The final accuracy we got was 95.59%, higher than another paper applying GA/KNN method on the same dataset. Based on the idea of Guided Grad Cam, as to each class, we generated significance heat-map for all the genes. By doing functional analysis on the genes with high intensities in the heat-maps, we validated ...

Research and Implementation of Cancer Gene Data Classification Based on Deep Learning

Journal of Software Engineering and Applications

Cancer has become a cause of concern in recent years. Cancer genomics is currently a key research direction in the fields of genetic biology and biomedicine. This paper analyzes 5 different types of cancer genes, such as breast, kidney, colon, lung and prostate through machine learning methods, with the goal of building a robust classification model to identify each type of cancer, which will allow us to identify each type of cancer early, thereby reducing mortality.

A hybrid gene selection model for molecular breast cancer classification using a deep neural network

International Journal of Applied Pattern Recognition, 2021

Microarray-based gene expression outlining portrays a dominant part in a healthier understanding of breast cancer. From the large quantum of data, a powerful technique is required to understand and extract the required information. The molecular subtype extraction is one of such important information regarding breast cancer, which is very crucial in defining its treatment strategy. This manuscript has formulated a deep neural network-based model for molecular classification of breast cancer. The proposed model exploits pre-processing steps along with the hybrid approach of filter and wrapper-based feature selection to extract relevant genes. The extracted genes are evaluated using various machine learning approaches where it is observed that selected features are successful in solving this multiclass problem. Using the proposed hybrid model, we have achieved the highest accuracy with six microarray datasets. The model outperforms magnificently in standings of sensitivity, f-measure, specificity, MCC and recall. Hence, deep neural network is identified as the best efficient classifiers concluding brilliant performance with all the selected microarray gene expression datasets for a range of selected genes.