Identifying Subtypes of Cancer Using Genomic Data by Applying Data Mining Techniques (original) (raw)
Related papers
Data Mining Techniques in Cancer Research Area
2014
In this paper we present an analysis of the prediction of survivability on different attributes, rate ofbreast cancer patients using data mining techniques. The data used is the real data. Thepreprocessed data set, which have all the available twelve fields from the database. We haveinvestigated data mining techniques:
Study on Data Mining Techniques for Cancer Prediction System
Cancer occurs when changes called mutations obtain situate in genes that control cell growth. The mutations permit the cells to divide and multiply in an uncontrolled, hectic way. The cells are multiplying, producing copies that obtain increasingly more abnormal. In the majority cases, the cell copies finally form a tumor. Cancer is the most vital reason for death in the world. The most of common cancers diagnosed in the world are those of the breast, lung, and blood cancers. The prognosis of different cancer is extremely variable. Several cancers are curable with early detection and treatment. Cancers that are aggressive at a later stage may be more difficult to cure. Knowledge Discovery in the database (KDD), which includes data mining techniques are has been used in healthcare. This study paper we have discussed various data mining techniques that have been utilized for the breast cancer, lung cancer, blood cancer. We focus on present research being carried out using the data mining approach to enhance the breast, lung, blood cancers risk factors are diagnosis and prognosis.
Oncological Analysis Using Data Mining
Research in Computing Science, 2016
Data mining is a technique that involves the application of specific algorithms, which generate a list of patterns based on large volumes of information that is useful for decision making in wide fields of application, as detection of patterns in diseases. This paper proposes to carry out a cancer analysis by means of data mining techniques, applied on a sample of 3365 cases, using association algorithms as Apriori and J48 classification algorithm. The data were obtained from clinical reports of patients from the Palliative Care Unit of the Hospital Regional 1o de Octubre. Once the data mining process was completed, it was possible to identify some of the most relevant types of cancer, age as characteristic factor in the emergence of diseases, as well as the sex affected.
Performance Evaluation of Data Mining Techniques Using Cancer Dataset
IJRAR, 2019
In recent years DM has attracted great attention in the healthcare industry and society as a whole. The objective of this research work is focused on the cluster creation of two cancer dataset and analyzed the performance of partition based algorithms. The two types of partition based algorithms namely Kmeans Plus and Affinithy Propagation are implemented. Comparative analysis of clustering algorithms is also carried out using two different dataset Colon and Leukemia. The performance of algorithms depends on the Correctly classified clusters and the Average accuracy of data. The Affinity Propagation algorithm is efficient for clustering the cancer dataset. The final outcome of this work is suitable to analyses the behavior of cancer in the department of oncology in cancer centers. Ultimate goal of this research work is to find out which type of dataset and algorithm will be most suitable for analysis of cancer data Introduction Data Mining is one of the most important area of research and is pragmatically used in different domains like finance, education, clinical research, healthcare, agriculture etc. in the aim of discovering useful information from large amount of dataset. This research uses different data mining techniques to cluster medical data. data mining tasks can be categorized in to two types: supervised tasks and unsupervised tasks. Supervised tasks have datasets that contain both the explanatory variables, dependent variables. The objective is to discover the associations between the explanatory and dependent variables. On the other hand, unsupervised tasks have datasets that contain only the explanatory variables with the objective to explore and generate postulates about the hidden structures of the data. Clustering is one of the most common untested data mining methods that explore the hidden structures embedded in a dataset. Clustering is the process of making group of abstract objects into classes of similar objects. A cluster of data objects can be treated as one group. While doing the cluster analysis, first partition the set of data into groups based on data similarity and then assigns the label to the groups. The main advantage of clustering over classification is adaptable to changes and help single out useful features that distinguished different groups.
A Survey on Cancer Prediction Using Data Mining Techniques
2018
1Head of the Department, Department of Computer Application & Information Technology, Kaamathenu Arts & Science College, Sathyamangalam, Tamilnadu, India. 2Research Scholar, Department of Computer Application & Information Technology, Kaamathenu Arts & Science College, Sathyamangalam, Tamilnadu, India ---------------------------------------------------------------------***------------------------------------------------------------------Abstract: Conventional semi-supervised clustering approaches have several shortcomings, such as (1) not fully utilizing all useful must-link and cannot-link constraints, (2) not considering how to deal with high dimensional data with noise, and (3) not fully addressing the need to use an adaptive process to further improve the performance of the algorithm. In this paper, we first propose the transitive closure based constraint propagation approach, which makes use of the transitive closure operator and the affinity propagation to address the first li...
Comparative Study of Recent Trends on Cancer Disease Prediction using Data Mining Techniques
International Journal of Database Theory and Application, 2016
Technological advancements have evolved into several application domains to solve various problems. One such technological area is Data Mining. It has shown its significance and potential in health care industries to serve as a guiding and decision making component. Its potential in unveiling new trends in health care organizations has proved its importance for all people associated with this area. It is the most important and encouraging area of research which have the motive to find out the information from large data set. Advance researches in data mining had made it a key player in health care field. Good analytical techniques are of utmost requirement for detecting precious information lying hidden in health industry data. This survey paper presents the importance and usefulness of different Data mining techniques such as classification, clustering, Decision Tree, Naive Bayes etc. in health domain. Here the study and comparison is done of different data mining techniques used for prediction of cancer disease from clinical dataset with different accuracy.
An Overview on Data Mining Approach on Breast Cancer data
2013
This paper gives the current overview of use of data mining techniques on breast cancer data. This paper also gives the study of data mining on medical domain which has already done from researchers. In this paper we use classification data mining techniques on breast cancer data with using data mining software. A huge amount of medical records are stored in databases. Data are produce from different sources and continuously stored in depositories. These databases are more complicated for the point of analysis. Data Mining is a relatively new field of research whose major objective is to acquire knowledge from large amounts of data.
An Integrated Cancer Prediction System Using Data Mining Techniques
2018
Cancer identification and prediction are huge challenge to the researchers. The use of various techniques of data mining techniques has revolutionized the whole process of cancer Diagnosis and Prognosis. We are proposing integrated system which is based on combination of various data mining techniques such as analytical hierarchy process, rule based association, classification etc. that is helpful to predict the patient's disease status. Cancer disease risk can be discovered by analyzing and identifying various factors and symptoms of the patient before recommending treatments. The vital aim of our system is to help oncologist and medical practitioners in diagnosing the patient by analyzing available data and relevant information.
On the adaption of data mining technology to categorize cancer diseases
International Journal Artificial Intelligent and Informatics, 2022
Along with data mining, tools and software have emerged to aid in mining the vast and growing amount of data to access knowledge in databases. These tools facilitate work on most scientific disciplines, including sciences, Libraries and information. Accordingly, Data mining became an effective technique for obtaining knowledge to achieve the basic goal of discovering hidden facts that are contained in databases through the use of multiple technologies that include artificial intelligence, statistical analyzes, techniques and data modeling etc. Medical data mining is considered one of the most important tools used in the field of medicine, especially in exploring and knowing health conditions according to records of former patients. In addition, data mining helps not only in categorizing cancer but also in taking the necessary measures. With the spread of cancer at high rates around the world, the need to develop smart methods that have the ability to predict the disease appeared. Ap...
Classification algorithms of data mining have been successfully applied in the recent years to predict cancer based on the gene expression data. Micro-array is a powerful diagnostic tool that can generate handful information of gene expression of all the human genes in a cell at once. Various classification algorithms can be applied on such micro-array data to devise methods that can predict the occurrence of tumor. However, the accuracy of such methods differ according to the classification algorithm used. Identifying the best classification algorithm among all available is a challenging task. In this study, we have made a comprehensive comparative analysis of 14 different classification algorithms and their performance has been evaluated by using 3 different cancer data sets. The results indicate that none of the classifiers outperformed all others in terms of the accuracy when applied on all the 3 data sets. Most of the algorithms performed better as the size of the data set is increased. We recommend the users not to stick to a particular classification method and should evaluate different classification algorithms and select the better algorithm.