Biological and Tumor Markers in Early Prediction Phase of Breast Cancer Using Classification and Regression Tree: Sebha Oncology Center as a Case study (original) (raw)

In recent years, many studies have emerged on developing screening tools for cancer depending on blood-based biological and tumor markers. It has achieved promising results in detecting Breast Cancer (BC), especially in the early stages of the disease. This study investigates the most relative biological and tumor markers and implements various machine learning classification techniques for BC prediction. For this purpose, data on BC were collected from the Sebha Oncology Center (SOC) through their routine blood tests. The correlation coefficient analysis shows that the most effective biological marker that may be used as cancer predictors are: Cancer Antigen (CA-15.3), Carcinoma Embryonic Antigen (CEA), White Blood Cells (WBC), Red Blood Cells (RBC), and Albumin (ALB). For performance measures, classifier models namely: Classification and Regression Tree (CART), K-Nearest Neighbors (KNN), Naive Bayes (NB), and Logistic Regression (LG), were trained on 10-fold cross-validation methods to evaluate classifiers’ accuracy. The results show that CART outperforms other classifiers in terms of accuracy, precision, recall, and F-measure score. Moreover, the obtained outcomes confirm that our research provides a significant contribution to BC prediction, which can ultimately assist SOC doctors in improving their cancer diagnosis.