Breast Cancer Detection using Decision Tree and K-Nearest Neighbour Classifiers (original) (raw)

DECISION TREE AND K-NEAREST NEIGHBOR (KNN) ALGORITHMS IN BREAST CANCER DIAGNOSIS: A RELATIVE PERFORMANCE EVALUATION

Transstellar journals, 2022

The accelerated growth of databases, which in each of the different fields of knowledge, and among them the health sector, are being created as a response to the evolution, development and articulation of technology in their daily work, The incursion of data mining models have become necessary in response to the need created by this cloud of information. This articulation has made it possible to record a more significant number of attributes related to the same study unit and expand the variety of formats in which data is recorded and stored. This exponential growth of databases has meant that the classic statistical techniques used by experts and researchers cannot fully reveal the underlying information in the set, making it necessary to introduce new analysis techniques such as those initially mentioned. Given this scenario, it is of interest to identify how much additional information the mining models offer on the exploratory analysis of the data, in addition to determining if any mining model will offer additional information. For this, a case study will be carried out on two data sets, each of which seeks to determine the malignancy of a mass detected in the patient's breast based on the characteristics measured on the mass, seeking to support timely decision-making and reducing procedures that can be costly and unnecessary for both the user and the service provider, since experience shows that 70% of the biopsies performed, based on the results of mammography, are unnecessary. Because of the large number of patients, it is critical to rapidly analyse these data to discover disease as early as possible.

Classification and feature selection of breast cancer data based on decision tree algorithm

Studies in Informatics and Control, 2003

Medical information systems have received a lot of research attention in the past. As a result of advances in hardware and software technologies, the nature of medical information systems has changed from only performing record keeping functions to more decision making oriented functionalities. Large collections of medical data are valuable resource from which potentially new and useful knowledge can be discovered through data mining. Data mining is an increasingly popular field that uses statistical, visualization, machine learning, and other data manipulation and knowledge extraction techniques aiming at gaining an insight into the relationships and patterns hidden in the data. It is very useful if results of data mining can be communicated to humans in an understandable way. In this paper, we introduce an efficient symbolic machine learning algorithm to identify the important breast cancer attributes needed for interpretation. The proposed technique is based on an inductive decision tree learning algorithm that has low complexity with high transparency and accuracy. In addition, among all features, we use only the subset of features that leads to the best performance. The proposed technique is evaluated using real data of 699 samples for building the decision tree. Evaluation shows that the ratio of correct classification of new cases is high.

Brief review of classification algorithm on breast cancer

Journal of Physics: Conference Series, 2019

Breast cancer is a malignant neoplasm disease which is an abnormal growth of breast tissue that is different from surrounding tissue. It became the second most case of death that caused by cancer after lung cancer. Several studies have been done to diagnose patients of breast cancer using various type of algorithm. This paper presents a brief review of breast cancer detection using datamining classification algorithm. C4.5, Naive Bayes and K-Nearest Neighbour algorithm are applied on breast cancer data set. Outputs are decision tree, models and computations. This study aims to evaluate performance of those three-data mining classification algorithm.

Breast Cancer Classification using Decision Tree Algorithms

International Journal of Advanced Computer Science and Applications

Cancer is a major health issue that affects individuals all over the world. This disease has claimed the lives of many people, and will continue to do so in the future. Breast cancer has recently surpassed cervical cancer as the most frequent cancer among women in both industrialized and developing countries and it is now the second leading cause of cancer mortality among women. A high number of women die each year as a result of this disease. Breast cancer is significantly easier to treat if caught early. This paper introduces a decision tree-based data mining technique for breast cancer early detection with highest accuracy, which helps patients to recover. Breast cancers are classed as benign (unable to penetrate surrounding tissue) or malignant (able to infiltrate adjacent tissue) breast growths. Two tests were included in the review. The primary study uses 10 breast cancer samples from the Kaggle archive, whereas the follow-up study uses 286 breast cancer samples from the same pool. The Decision Tree's accuracy in the first trial was 100%, while it was 97.9% in the follow-up inquiry. These findings justify the use of the proposed machine learning-based Decision Tree classifier in pre-evaluating patients for triage and decision-making prior to the availability of data.

Classification of Breast Cancer Tissues using Decision Tree Algorithms

Nowadays, Healthcare sector data are enormous, composite and diverse because it contains a data of different types and getting knowledge from that data is essential. So for this purpose, data mining techniques may be utilized to mine knowledge by building models from healthcare dataset. At present, the classification of breast cancer patients has been a demanding research confront for many researchers. For building a classification model for the cancer patient, we used four different classification algorithms such as J48, REPTree, RandomForest, and RandomTree and tested on the dataset taken from UCI. The main aim of this paper is to classify the patient into benign (not cancer) or malignant (cancer), based on some diagnostic measurements integrated into the dataset.

Diagnosis of Breast Cancer using Decision Tree Data Mining Technique

Cancer is a big issue all around the world. It is a disease, which is fatal in many cases and has affected the lives of many and will continue to affect the lives of many more. Breast cancer represents the second primary cause of cancer deaths in women today and has become the most common cancer among women both in the developed and the developing world in the last years. 40,000 women die in a year from this disease, which is one woman every 13 minute dying from this disease everyday.

Breast Cancer Prediction using KNN, SVM, Logistic Regression and Decision Tree

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

Each year number of deaths is increasing extremely because of breast cancer. It is the most frequent type of all cancers and the major cause of death in women worldwide. Any development for prediction and diagnosis of cancer disease is capital important for a healthy life. Consequently, high accuracy in cancer prediction is important to update the treatment aspect and the survivability standard of patients. Machine learning techniques can bring a large contribute on the process of prediction and early diagnosis of breast cancer, became a research hotspot and has been proved as a strong technique. In this study, we applied five machine learning algorithms: Support Vector Machine (SVM), Random Forest, Logistic Regression, Decision tree (C4.5) and K-Nearest Neighbours (KNN) on the Breast Cancer Wisconsin Diagnostic dataset, after obtaining the results, a performance evaluation and comparison is carried out between these different classifiers. The main objective of this research paper is to predict and diagnosis breast cancer, using machine-learning algorithms, and find out the most effective whit respect to confusion matrix, accuracy and precision. It is observed that Support vector Machine outperformed all other classifiers and achieved the highest accuracy (97.2%). All the work is done in the Anaconda environment based on python programming language and Scikit-learn library.

A Comparative Analysis of Methods for Detecting and Diagnosing Breast Cancer Based on Data Mining

Journal of Artificial Intelligence and Metaheuristics, 2023

Breast cancer is a significant public health concern worldwide, and early detection is crucial for its treatment. Although breast cancer has been extensively studied, there is still room for improvement in its classification accuracy. This study aims to improve the classification accuracy of breast cancer by applying information gain feature selection and machine learning techniques to the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. The information gain method is utilized to reduce feature characteristics, and machine learning algorithms such as support vector machine (SVM), naive Bayes (NB), and C4.5 decision tree are employed for breast cancer classification. The study also conducts a comparison analysis based on accuracy value. The proposed model achieves maximum classification accuracy (100%) and a weighted average for precision (100%) and recall (100%) using a C4.5 decision tree, while SVM accuracy (98.42%) and weighted average for precision (98.17%) and recall (98.58%) are achieved using a C4.5 decision tree. The NB algorithm attains an accuracy of 96%, with a weighted average for precision (18.57%) and recall (50%). The proposed model's results are compared to similar studies and demonstrate significant progress, indicating new opportunities for breast cancer detection.

Effective K-nearest neighbor classifications for Wisconsin breast cancer data sets

Journal of the Chinese Institute of Engineers, 2019

Using machine learning algorithms for early prediction of the signs and symptoms of breast cancer is in demand nowadays. One of these algorithms is the K-nearest neighbor (KNN), which uses a technique for measuring the distance among data. The performance of KNN depends on the number of neighboring elements known as the K value. This study involves the exploration of KNN performance by using various distance functions and K values to find an effective KNN. Wisconsin breast cancer (WBC) and Wisconsin diagnostic breast cancer (WDBC) datasets from the UC Irvine machine learning repository were used as our main data sources. Experiments with each dataset were composed of three iterations. The first iteration of the experiment was without feature selection. The second one was the L1-norm based selection from the model, which used the linear support vector classifier feature selection, and the third iteration was with Chi-square-based feature selection. Numerous evaluation metrics like accuracy, receiver operating characteristic (ROC) curve with the area under curve (AUC) and sensitivity, etc., were used for the assessment of the implemented techniques. The results indicated that the technique involving the Chi-square-based feature selection achieved the highest accuracy with the Canberra or Manhattan distance functions for both datasets. The optimal K values for these distance functions ranged from 1 to 9. This study indicated that with the appropriate selection of the K value and a distance function in KNN, the Chi-square-based feature selection for the WBC datasets gives the highest accuracy rate as compared with the existing models.