Paper (original) (raw)

Breast Cancer Diagnosis on Three Different Datasets Using Multi-Classifiers

2012

This paper presents a comparison among the different classifiers decision tree (J48), Multi-Layer Perception (MLP), Naive Bayes (NB), Sequential Minimal Optimization (SMO), and Instance Based for K-Nearest neighbor (IBK) on three different databases of breast cancer (Wisconsin Breast Cancer (WBC), Wisconsin Diagnosis Breast Cancer (WDBC) and Wisconsin Prognosis Breast Cancer (WPBC)) by using classification accuracy and confusion matrix based on 10-fold cross validation method. Also, we introduce a fusion at classification level between these classifiers to get the most suitable multi-classifier approach for each data set. The experimental results show that in the classification using fusion of MLP and J48 with the PCA is superior to the other classifiers using WBC data set. The PCA is used in WBC dataset as a features reduction transformation method in which combines a set of correlated features. The selected attributes are: Uniformity of Cell Size, Mitoses, Clump thickness, Bare Nu...

Breast Cancer Detection using Decision Tree and K-Nearest Neighbour Classifiers

Iraqi Journal of Science

Data mining has the most important role in healthcare for discovering hidden relationships in big datasets, especially in breast cancer diagnostics, which is the most popular cause of death in the world. In this paper two algorithms are applied that are decision tree and K-Nearest Neighbour for diagnosing Breast Cancer Grad in order to reduce its risk on patients. In decision tree with feature selection, the Gini index gives an accuracy of %87.83, while with entropy, the feature selection gives an accuracy of %86.77. In both cases, Age appeared as the most effective parameter, particularly when Age<49.5. Whereas Ki67 appeared as a second effective parameter. Furthermore, K- Nearest Neighbor is based on the minimum error rate, and the test maximum accuracy for K_value selection with an accuracy of 86.24%. Where the distance metric has been assigned using the Euclidean approach. From previous models, it seems that Breast Cancer Grade2 is the most prevalent type. For the fu...

Brief review of classification algorithm on breast cancer

Journal of Physics: Conference Series, 2019

Breast cancer is a malignant neoplasm disease which is an abnormal growth of breast tissue that is different from surrounding tissue. It became the second most case of death that caused by cancer after lung cancer. Several studies have been done to diagnose patients of breast cancer using various type of algorithm. This paper presents a brief review of breast cancer detection using datamining classification algorithm. C4.5, Naive Bayes and K-Nearest Neighbour algorithm are applied on breast cancer data set. Outputs are decision tree, models and computations. This study aims to evaluate performance of those three-data mining classification algorithm.

COMPARATIVE ANALYSIS OF CLASSIFICATION APPROACHES FOR BREAST CANCER

IAEME PUBLICATION, 2019

Breast cancer is one of the greatest common diseases among women in Africa and worldwide. Accurate and early diagnosis is very significant phase in therapy and action. However, it is not an easy one due to some doubts in detection of breast cancer. Machine learning helps us to extract information and knowledge from this the basis of past experiences and detect hard-to-perceive pattern from large and noisy dataset. This paper compares and analysis the performance of machine learning algorithms, namely Decision Tree (DT), Logistic Regression (LR), Naïve Bayes (NB), and K-Nearest Neighbors (KNN) for detecting breast cancer. The data set used for comparison was from UCI Wisconsin original breast cancer data set. The result outcome shows that Logistic Regression performs better and classification accuracy is 96.93%.

Classification of Breast Cancer Tissues using Decision Tree Algorithms

Nowadays, Healthcare sector data are enormous, composite and diverse because it contains a data of different types and getting knowledge from that data is essential. So for this purpose, data mining techniques may be utilized to mine knowledge by building models from healthcare dataset. At present, the classification of breast cancer patients has been a demanding research confront for many researchers. For building a classification model for the cancer patient, we used four different classification algorithms such as J48, REPTree, RandomForest, and RandomTree and tested on the dataset taken from UCI. The main aim of this paper is to classify the patient into benign (not cancer) or malignant (cancer), based on some diagnostic measurements integrated into the dataset.

DECISION TREE AND K-NEAREST NEIGHBOR (KNN) ALGORITHMS IN BREAST CANCER DIAGNOSIS: A RELATIVE PERFORMANCE EVALUATION

Transstellar journals, 2022

The accelerated growth of databases, which in each of the different fields of knowledge, and among them the health sector, are being created as a response to the evolution, development and articulation of technology in their daily work, The incursion of data mining models have become necessary in response to the need created by this cloud of information. This articulation has made it possible to record a more significant number of attributes related to the same study unit and expand the variety of formats in which data is recorded and stored. This exponential growth of databases has meant that the classic statistical techniques used by experts and researchers cannot fully reveal the underlying information in the set, making it necessary to introduce new analysis techniques such as those initially mentioned. Given this scenario, it is of interest to identify how much additional information the mining models offer on the exploratory analysis of the data, in addition to determining if any mining model will offer additional information. For this, a case study will be carried out on two data sets, each of which seeks to determine the malignancy of a mass detected in the patient's breast based on the characteristics measured on the mass, seeking to support timely decision-making and reducing procedures that can be costly and unnecessary for both the user and the service provider, since experience shows that 70% of the biopsies performed, based on the results of mammography, are unnecessary. Because of the large number of patients, it is critical to rapidly analyse these data to discover disease as early as possible.

Biological and Tumor Markers in Early Prediction Phase of Breast Cancer Using Classification and Regression Tree: Sebha Oncology Center as a Case study

2022 IEEE 2nd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA)

In recent years, many studies have emerged on developing screening tools for cancer depending on blood-based biological and tumor markers. It has achieved promising results in detecting Breast Cancer (BC), especially in the early stages of the disease. This study investigates the most relative biological and tumor markers and implements various machine learning classification techniques for BC prediction. For this purpose, data on BC were collected from the Sebha Oncology Center (SOC) through their routine blood tests. The correlation coefficient analysis shows that the most effective biological marker that may be used as cancer predictors are: Cancer Antigen (CA-15.3), Carcinoma Embryonic Antigen (CEA), White Blood Cells (WBC), Red Blood Cells (RBC), and Albumin (ALB). For performance measures, classifier models namely: Classification and Regression Tree (CART), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Logistic Regression (LG), were trained on 10-fold cross-validation methods to evaluate classifiers’ accuracy. The results show that CART outperforms other classifiers in terms of accuracy, precision, recall, and F-measure score. Moreover, the obtained outcomes confirm that our research provides a significant contribution to BC prediction, which can ultimately assist SOC doctors in improving their cancer diagnosis.

COMPARATIVE STUDY ON DIFFERENT CLASSIFICATION TECHNIQUES FOR BREAST CANCER DATASET

Breast cancer is one of the most common cancers among women in the world. Early detection of breast cancer is essential in reducing their life losses. Data mining is the process of analyzing massive data and summarizing it into useful knowledge discovery and the role of data mining approaches is growing rapidly especially classification techniques are very effective way to classifying the data, which is essential in decision-making process for medical practitioners. This study presents the different data mining classifiers on the database of breast cancer, by using classification accuracy with and without feature selection techniques. Feature selection increases the accuracy of the classifier because it eliminates irrelevant attributes. The experiment shows that the feature selection enhances the accuracy of all three different classifiers, reduces the Mean Standard Error (MSE) and increase Receiver Operating Characteristics (ROC).

ANALYSIS OF BREAST CANCER CLASSIFICATION USING VARIOUS ALGORITHMS

IEEE, 2022

Breast cancer has become one of the most common types of tumors in recent years. The chances of the patient arriving in the case improve if the issue is detected early. The most common categorization for breast cancer is binary (benign cancer/malign cancer), which allows pathologists to discover a systematic and objective prognosis. The WBCD (Wisconsin Breast Cancer Diagnosis) dataset is used to build the classifier. Before performing the classification, the dataset is preprocessed and explored using various techniques. Four Machine Learning techniques like k-fold cross-validation, pipelining, principle component analysis, and hyperparameter optimization are compared and analyzed in the project. These techniques are applied to the development of eight classifiers that must distinguish between benign and malignant breast tumors. These techniques were applied on the Adaboost classifier, KNeighbor classifier, Decision tree classifier, random forest classifier, SVM classifier, Logistic regression classifier, Gaussian NB classifier, and Gradient boosting classifier. After applying all those algorithms, the SVM classifier has obtained the highest accuracy of 99.1% after applying all those algorithms on the dataset.

IRJET- Comparative Analysis of Classification Algorithms on Breast Cancer Dataset

IRJET, 2021

Breast cancer is one of the leading diseases among women nowadays. The abnormal growth of cells in breast tissue causes cancer in women. Early detection of breast cancer and breast cancer recurrence can save the patient. Classification algorithms are used to classify whether cancer is recurrent or non-recurrent. This paper presents the comparative analysis of various classification algorithms viz. K-Nearest Neighbors (KNN), Naïve Bayes and Random Forest. When these three classification algorithms are applied to the breast cancer dataset in the WEKA data mining tool for classifying recurrent and non-recurrent breast cancer, it has been observed that KNN has the highest True Positive rate (0.738) and Random forest has the lowest True Positive rate (0.696). It is observed that the performance of the K-Nearest Neighbors classifier algorithm is highest and the second most accurate classifier is Naïve Bayes with correctly classified instances of 71.6783%.