manpreet kaur - Academia.edu (original) (raw)
Papers by manpreet kaur
Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks ... more Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were: DO Sat, DO Mgl, BOD Mgl, COD Mgl, TS Mgl, DO Index, AN Index, SS Index, Class, and Degree of pollution. The data consisted of 166 instances and obtained from the East Coast Environmental Research Institute (ESERI) of Universiti Sultan Zainal Abidin (UniSZA). The result of MLP and IBk performed better than other classifiers for Kinta River dataset because these classifiers showed the highest accuracy with the same percentage of 91.57%. In the future, we will propose the multiclassifier approach by introducing a fusion at a classification level between these classifiers to get a higher accuracy of classification.
This paper presents a comparison among the different classifiers decision tree (J48), Multi-Layer... more This paper presents a comparison among the different classifiers decision tree (J48), Multi-Layer Perception (MLP), Naive Bayes (NB), Sequential Minimal Optimization (SMO), and Instance Based for K-Nearest neighbor (IBK) on three different databases of breast cancer (Wisconsin Breast Cancer (WBC), Wisconsin Diagnosis Breast Cancer (WDBC) and Wisconsin Prognosis Breast Cancer (WPBC)) by using classification accuracy and confusion matrix based on 10-fold cross validation method. Also, we introduce a fusion at classification level between these classifiers to get the most suitable multi-classifier approach for each data set. The experimental results show that in the classification using fusion of MLP and J48 with the PCA is superior to the other classifiers using WBC data set. The PCA is used in WBC dataset as a features reduction transformation method in which combines a set of correlated features. The selected attributes are: Uniformity of Cell Size, Mitoses, Clump thickness, Bare Nuclei, Single Epithelial cell size, Marginal adhesion, Bland Chromatin and Class. In WDBC data set the results show that the classification using SMO only or using fusion of SMO and MLP or SMO and IBK is superior to the other classifiers. In WPBC data set the results show that the classification using fusion of MLP, J48, SMO and IBK is superior to the other classifiers. All experiments are conducted in WEKA data mining tool.
—Classification of imbalanced dataset is the most popular and challenged problems for researchers... more —Classification of imbalanced dataset is the most popular and challenged problems for researchers to solve in nowadays. This paper proposed a two-steps approach to improve the quality of class prediction imbalanced breast cancer dataset. The two-steps approach consists of two main techniques: 1) using feature selection techniques to filter out unimportant features from the dataset; and 2) using the over-sampling technique to adjust the size of the minority class to be similar to the size of the majority class. The three different classification algorithms: artificial neural network (MLP), decision tree (C4.5) and Naï ve Bayes, were applied. The classification result indicated that C4.5 was the most suitable to classify this dataset which can give the highest accuracy of 83.80%.
Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks ... more Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were: DO Sat, DO Mgl, BOD Mgl, COD Mgl, TS Mgl, DO Index, AN Index, SS Index, Class, and Degree of pollution. The data consisted of 166 instances and obtained from the East Coast Environmental Research Institute (ESERI) of Universiti Sultan Zainal Abidin (UniSZA). The result of MLP and IBk performed better than other classifiers for Kinta River dataset because these classifiers showed the highest accuracy with the same percentage of 91.57%. In the future, we will propose the multiclassifier approach by introducing a fusion at a classification level between these classifiers to get a higher accuracy of classification.
Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks ... more Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were: DO Sat, DO Mgl, BOD Mgl, COD Mgl, TS Mgl, DO Index, AN Index, SS Index, Class, and Degree of pollution. The data consisted of 166 instances and obtained from the East Coast Environmental Research Institute (ESERI) of Universiti Sultan Zainal Abidin (UniSZA). The result of MLP and IBk performed better than other classifiers for Kinta River dataset because these classifiers showed the highest accuracy with the same percentage of 91.57%. In the future, we will propose the multiclassifier approach by introducing a fusion at a classification level between these classifiers to get a higher accuracy of classification.
This paper presents a comparison among the different classifiers decision tree (J48), Multi-Layer... more This paper presents a comparison among the different classifiers decision tree (J48), Multi-Layer Perception (MLP), Naive Bayes (NB), Sequential Minimal Optimization (SMO), and Instance Based for K-Nearest neighbor (IBK) on three different databases of breast cancer (Wisconsin Breast Cancer (WBC), Wisconsin Diagnosis Breast Cancer (WDBC) and Wisconsin Prognosis Breast Cancer (WPBC)) by using classification accuracy and confusion matrix based on 10-fold cross validation method. Also, we introduce a fusion at classification level between these classifiers to get the most suitable multi-classifier approach for each data set. The experimental results show that in the classification using fusion of MLP and J48 with the PCA is superior to the other classifiers using WBC data set. The PCA is used in WBC dataset as a features reduction transformation method in which combines a set of correlated features. The selected attributes are: Uniformity of Cell Size, Mitoses, Clump thickness, Bare Nuclei, Single Epithelial cell size, Marginal adhesion, Bland Chromatin and Class. In WDBC data set the results show that the classification using SMO only or using fusion of SMO and MLP or SMO and IBK is superior to the other classifiers. In WPBC data set the results show that the classification using fusion of MLP, J48, SMO and IBK is superior to the other classifiers. All experiments are conducted in WEKA data mining tool.
—Classification of imbalanced dataset is the most popular and challenged problems for researchers... more —Classification of imbalanced dataset is the most popular and challenged problems for researchers to solve in nowadays. This paper proposed a two-steps approach to improve the quality of class prediction imbalanced breast cancer dataset. The two-steps approach consists of two main techniques: 1) using feature selection techniques to filter out unimportant features from the dataset; and 2) using the over-sampling technique to adjust the size of the minority class to be similar to the size of the majority class. The three different classification algorithms: artificial neural network (MLP), decision tree (C4.5) and Naï ve Bayes, were applied. The classification result indicated that C4.5 was the most suitable to classify this dataset which can give the highest accuracy of 83.80%.
Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks ... more Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were: DO Sat, DO Mgl, BOD Mgl, COD Mgl, TS Mgl, DO Index, AN Index, SS Index, Class, and Degree of pollution. The data consisted of 166 instances and obtained from the East Coast Environmental Research Institute (ESERI) of Universiti Sultan Zainal Abidin (UniSZA). The result of MLP and IBk performed better than other classifiers for Kinta River dataset because these classifiers showed the highest accuracy with the same percentage of 91.57%. In the future, we will propose the multiclassifier approach by introducing a fusion at a classification level between these classifiers to get a higher accuracy of classification.