Paper Title (use style: paper title) (original) (raw)

Classification Mining SNPs from Leukaemia Cancer Dataset Using Linear Classifier with ACO

Single Nucleotide Polymorphisms(SNP) are the foremost common type of genetic variation in human comprising nearly1/1,000 th of the typical human genome. SNP offer the foremost complete information for genome-wide association studies. We tend to propose a process methodology to quickly notice true SNPs in public-available leukemia cancer database. Much analysis has been specializing in various genetic models to spot genes that may predict the disease status. However, increasing the amount of SNPs generates large amount of combined genetic outcomes to be tested. Classification could be a data processing technique to predict cluster membership for data instances. The ACO could be a probabilistic technique for computational issues which may be reduced to finding sensible ways through graphs. In this research paper data mining classification techniques linear classifier are analyzed with ACO on leukemia cancer dataset. Performance of these techniques is compared by accuracy, Sensitivity and Specificity. The experimental results show that Linear classifier with ACO is able to distinguish cancer diseases from normal with the maximum accuracy of 73.20%, Sensitivity of 69.21% and specificity of 65% whereas SVM are 70.00% of accuracy, 65.20% of sensitivity and specificity of 63.53%.

Performance Evaluation of Data Mining Techniques Using Cancer Dataset

IJRAR, 2019

In recent years DM has attracted great attention in the healthcare industry and society as a whole. The objective of this research work is focused on the cluster creation of two cancer dataset and analyzed the performance of partition based algorithms. The two types of partition based algorithms namely Kmeans Plus and Affinithy Propagation are implemented. Comparative analysis of clustering algorithms is also carried out using two different dataset Colon and Leukemia. The performance of algorithms depends on the Correctly classified clusters and the Average accuracy of data. The Affinity Propagation algorithm is efficient for clustering the cancer dataset. The final outcome of this work is suitable to analyses the behavior of cancer in the department of oncology in cancer centers. Ultimate goal of this research work is to find out which type of dataset and algorithm will be most suitable for analysis of cancer data Introduction Data Mining is one of the most important area of research and is pragmatically used in different domains like finance, education, clinical research, healthcare, agriculture etc. in the aim of discovering useful information from large amount of dataset. This research uses different data mining techniques to cluster medical data. data mining tasks can be categorized in to two types: supervised tasks and unsupervised tasks. Supervised tasks have datasets that contain both the explanatory variables, dependent variables. The objective is to discover the associations between the explanatory and dependent variables. On the other hand, unsupervised tasks have datasets that contain only the explanatory variables with the objective to explore and generate postulates about the hidden structures of the data. Clustering is one of the most common untested data mining methods that explore the hidden structures embedded in a dataset. Clustering is the process of making group of abstract objects into classes of similar objects. A cluster of data objects can be treated as one group. While doing the cluster analysis, first partition the set of data into groups based on data similarity and then assigns the label to the groups. The main advantage of clustering over classification is adaptable to changes and help single out useful features that distinguished different groups.

Mining of Important Informative Genes and Classifier Construction for Cancer Dataset

International Journal on Soft Computing, 2012

Microarray is a useful technique for measuring expression data of thousands or more of genes simultaneously. One of challenges in classification of cancer using high-dimensional gene expression data is to select a minimal number of relevant genes which can maximize classification accuracy. Because of the distinct characteristics inherent to specific cancerous gene expression profiles, developing flexible and robust gene identification methods is extremely fundamental. Many gene selection methods as well as their corresponding classifiers have been proposed. In the proposed method, a single gene with high classdiscrimination capability is selected and classification rules are generated for cancer based on gene expression profiles. The method first computes importance factor of each gene of experimental cancer dataset by counting number of linguistic terms (defined in terms of different discreet quantity) with high class discrimination capability according to their depended degree of classes. Then initial important genes are selected according to high importance factor of each gene and form initial reduct. Then traditional kmeans clustering algorithm is applied on each selected gene of initial reduct and compute missclassification errors of individual genes. The final reduct is formed by selecting most important genes with respect to less miss-classification errors. Then a classifier is constructed based on decision rules induced by selected important genes (single) from training dataset to classify cancerous and non-cancerous samples of experimental test dataset. The proposed method test on four publicly available cancerous gene expression test dataset. In most of cases, accurate classifications outcomes are obtained by just using important (single) genes that are highly correlated with the pathogenesis cancer are identified. Also to prove the robustness of proposed method compares the outcomes (correctly classified instances) with some existing well known classifiers.

Identifying Subtypes of Cancer Using Genomic Data by Applying Data Mining Techniques

Int. J. Nat. Comput. Res., 2019

This article is about the study of genomics structures and identifying cancer types from it. It divides into six parts. The first part is about the introduction of cancer, types of cancers, how cancer arises, etc. The second part is about the genomic study and how cancer is related to that, which features are used for the study. The third part is about the software which the authors have used to study these genomic structures, which data sets are used, and what is the final output for this study. The fourth part shows the proposed algorithm for the study. The fifth part shows the data preprocessing and clustering. Different preprocessing and clustering algorithms are used. The sixth part shows the results and conclusion with a future scope. The genomics data which is used by this article is taken from the Cancer Genome Atlas data portal which is freely available. Some applied imputation techniques fill up for the missing values and important features are extracted. Different cluster...

On the adaption of data mining technology to categorize cancer diseases

International Journal Artificial Intelligent and Informatics, 2022

Along with data mining, tools and software have emerged to aid in mining the vast and growing amount of data to access knowledge in databases. These tools facilitate work on most scientific disciplines, including sciences, Libraries and information. Accordingly, Data mining became an effective technique for obtaining knowledge to achieve the basic goal of discovering hidden facts that are contained in databases through the use of multiple technologies that include artificial intelligence, statistical analyzes, techniques and data modeling etc. Medical data mining is considered one of the most important tools used in the field of medicine, especially in exploring and knowing health conditions according to records of former patients. In addition, data mining helps not only in categorizing cancer but also in taking the necessary measures. With the spread of cancer at high rates around the world, the need to develop smart methods that have the ability to predict the disease appeared. Ap...

Effective Data Mining Technique for Classification Cancers via Mutations in Gene using Neural Network

—The prediction plays the important role in detecting efficient protection and therapy/treatment of cancer. The prediction of mutations in gene needs a diagnostic and classification, which is based on the whole database (big dataset enough), to reach sufficient accuracy/correct results. Since the tumor suppressor P53 is approximately about fifty percentage of all human tumors because mutations that occur in the TP53 gene into the cells. So, this paper is applied on tumor p53, where the problem is there are several primitive databases (e.g. excel genome and protein database) contain datasets of TP53 gene with its tumor protein p53, these databases are rich datasets that cover all mutations and cause diseases (cancers). But these Data Bases cannot reach to predict and diagnosis cancers, i.e. the big datasets have not efficient Data Mining method, which can predict, diagnosis the mutation, and classify the cancer of patient. The goal of this paper to reach a Data Mining technique, that employs neural network, which bases on the big datasets. Also, offers friendly predictions, flexible, and effective classified cancers, in order to overcome the previous techniques drawbacks. This proposed technique is done by using two approaches, first, bioinformatics techniques by using BLAST, CLUSTALW, etc, in order to know if there are malignant mutations or not. The second, data mining by using neural network; it is selected (12) out of (53) TP53 gene database fields. To clarify, one of these 12 fields (gene location field) did not exists inTP53 gene database; therefore, it is added to the database of TP53 gene in training and testing back propagation algorithm, in order to classify specifically the types of cancers. Feed Forward Back Propagation supports this Data Mining method with data training rate (1) and Mean Square Error (MSE) (0.00000000000001). This effective technique allows in a quick, accurate and easy way to classify the type of cancer.

Classification and Prediction of Disease Classes using Gene Microarray Data

Integrated Intelligent Research, 2012

In the year 1999, when T. R Golub first presented an idea for classifying cancer at the molecular level, this boosted research in cancer diagnosis to a whole new level. The researchers began to analyze the disease at the genetic level with the help of microarray databases. Then there were many new algorithms designed by researchers to classify different types of cancer. The objective of this paper is to present a tool designed exclusively to predict and classify leukemia into its types. The leukemia dataset published by Golub is used for this purpose. The first step is to identify the most significant genes causing cancer from the training set. These selected genes then are used to build the classifier based on decision rules, and eventually to predict the type of leukamia. This classifier which is modeled based on decision rules is found to work with an accuracy of 94%. The algorithm is quite simple in terms of complexity. It is possible to use a minimum number of genes for classification purposes rather than using a large set of genes. The genes that are responsible for prognosis of cancer are mainly selected for designing the classifier.

Study on Data Mining Techniques for Cancer Prediction System

Cancer occurs when changes called mutations obtain situate in genes that control cell growth. The mutations permit the cells to divide and multiply in an uncontrolled, hectic way. The cells are multiplying, producing copies that obtain increasingly more abnormal. In the majority cases, the cell copies finally form a tumor. Cancer is the most vital reason for death in the world. The most of common cancers diagnosed in the world are those of the breast, lung, and blood cancers. The prognosis of different cancer is extremely variable. Several cancers are curable with early detection and treatment. Cancers that are aggressive at a later stage may be more difficult to cure. Knowledge Discovery in the database (KDD), which includes data mining techniques are has been used in healthcare. This study paper we have discussed various data mining techniques that have been utilized for the breast cancer, lung cancer, blood cancer. We focus on present research being carried out using the data mining approach to enhance the breast, lung, blood cancers risk factors are diagnosis and prognosis.

Performance Analysis and Evaluation of Different Data Mining Algorithms used for Cancer Classification

Classification algorithms of data mining have been successfully applied in the recent years to predict cancer based on the gene expression data. Micro-array is a powerful diagnostic tool that can generate handful information of gene expression of all the human genes in a cell at once. Various classification algorithms can be applied on such micro-array data to devise methods that can predict the occurrence of tumor. However, the accuracy of such methods differ according to the classification algorithm used. Identifying the best classification algorithm among all available is a challenging task. In this study, we have made a comprehensive comparative analysis of 14 different classification algorithms and their performance has been evaluated by using 3 different cancer data sets. The results indicate that none of the classifiers outperformed all others in terms of the accuracy when applied on all the 3 data sets. Most of the algorithms performed better as the size of the data set is increased. We recommend the users not to stick to a particular classification method and should evaluate different classification algorithms and select the better algorithm.

Classification Analysis of Genetic Variations In Cancer Diagnosis By Multiclass Classifiers

2021

Machine learning in medical imaging plays the greatest disruptive technology in decades. It is being emerged not only to identify cancerous tumors at an earlier stage, but also detect and classify lesions, analyze data, reconstruct images, and more. Cancer is one of the heterogeneous disease consists of many different subclasses. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research for facilitating the subsequent clinical management of patients. Researchers, healthcare organizations, companies from biomedical and bioinformatics look forward for improving clinical outcomes for cancer patients and those who may not by using diagnostic and prognostic biomarkers. But the challenge is distinguishing the mutations that contribute to tumor growth (drivers) from the neutral mutations (passengers). Currently this interpretation of genetic mutations is being done manually based on evidence from text-based clinical literature. This article is focusing on analyzing the performance of multiclass classification algorithms for genetic features.