Dimensionality Reduction of Unbalanced Datasets: Principal Component Analysis (original) (raw)
2021 Asian Conference on Innovation in Technology (ASIANCON), 2021
Abstract
In this digital world, sharing of information is very easy and cost-effective; resulting in a large amount of high-dimensional data, available in a variety of domains such as healthcare, finance, etc. Data available in the healthcare domain is used for disease diagnosis using Machine Learning (ML) models. The data set is the heart of the machine learning model. But the performance of such a model will not be satisfactory due to unbalance of the data set. One of the important points is that we can use sensitivity (true positive rate) and specificity (true negative rate) as performance measures along with accuracy. For ML-based healthcare systems, sensitivity plays a vital role. To balance unbalanced data set, the primary step is data preprocessing like feature selection and feature extraction. The proposed method used here is feature extraction method, Principle Component Analysis technique (PCA). Experimentation was done on Pima Diabetic Data set and calculations were done for accuracy as well as sensitivity. Obtained results proved that PCA is a better option for dimensionality reduction and also, it helps to improve the performance of the systems.
Swati Narwane hasn't uploaded this paper.
Let Swati know you want this paper to be uploaded.
Ask for this paper to be uploaded.