Md. Alamin Talukder | International University Of Business Agriculture and Technology(IUBAT) (original) (raw)
Books by Md. Alamin Talukder
Elsevier - International-Journal-of-Cognitive-Computing-in-Engineering, 2021
Diabetes is a very common disease affecting individuals worldwide. Diabetes increases the risk of... more Diabetes is a very common disease affecting individuals worldwide. Diabetes increases the risk of long-term complications including heart disease, and kidney failure among others. People might live longer and lead healthier lives if this disease is detected early. Different supervised machine learning models trained with appropriate datasets can aid in diagnosing the diabetes at the primary stage. The goal of this work is to find effective machinelearning-based classifier models for detecting diabetes in individuals utilizing clinical data. The machine learning algorithms to be trained with several datasets in this article include Decision tree (DT), Naive Bayes (NB), k-nearest neighbor (KNN), Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR) and Support Vector Machine (SVM). We have applied efficient pre-processing techniques including label-encoding and normalization that improve the accuracy of the models. Further, using various feature selection approaches, we have identified and prioritized a number of risk factors. Extensive experiments have been conducted to analyze the performance of the model using two different datasets. Our model is compared with some recent study and the results show that the proposed model can provide better accuracy of 2.71% to 13.13% depending on the dataset and the adopted ML algorithm. Finally, a machine learning algorithm showing the highest accuracy is selected for further development. We integrate this model in a web application using python flask web development framework. The results of this study suggest that an appropriate preprocessing pipeline on clinical data and applying ML-based classification may predict diabetes accurately and efficiently.
Papers by Md. Alamin Talukder
Expert Systems With Applications, 2022
Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemica... more Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemical abnormalities. Lung and colon cancer have emerged as two of the leading causes of death and disability in humans. The histopathological detection of such malignancies is usually the most important component in determining the best course of action. Early detection of the ailment on either front considerably decreases the likelihood of mortality. Machine learning and deep learning techniques can be utilized to speed up such cancer detection, allowing researchers to study a large number of patients in a much shorter amount of time and at a lower cost. In this research work, we introduced a hybrid ensemble feature extraction model to efficiently identify lung and colon cancer. It integrates deep feature extraction and ensemble learning with high-performance filtering for cancer image datasets. The model is evaluated on histopathological (LC25000) lung and colon datasets. According to the study findings, our hybrid model can detect lung, colon, and (lung and colon) cancer with accuracy rates of 99.05%, 100%, and 99.30%, respectively. The study's findings show that our proposed strategy outperforms existing models significantly. Thus, these models could be applicable in clinics to support the doctor in the diagnosis of cancers.
Journal of Information Security and Applications, 2022
Network intrusion detection systems (NIDSs) play an important role in computer network security. ... more Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms in order to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues.
Computers in Biology and Medicine, Dec 31, 2023
Journal of big data, Feb 22, 2024
International journal of cognitive computing in engineering, Jun 1, 2024
FEBS open bio, May 23, 2024
International journal of cognitive computing in engineering, May 1, 2024
The rapid growth of internet users and social networking sites presents significant challenges fo... more The rapid growth of internet users and social networking sites presents significant challenges for entrepreneurs and marketers. Understanding the evolving behavioral and psychological patterns across consumer demographics is crucial for adapting business models effectively. Particularly, the emergence of new firms targeting adolescents and future generations underscores the importance of comprehending online consumer behavior and communication dynamics. To tackle these challenges, we introduce a Machine Learning-based Digital Native Market Segmentation designed to cater specifically to the interests of digital natives. Leveraging an open-access prototype dataset from social networking sites (SNS), our study employs a variety of clustering techniques, including Kmeans, MiniBatch Kmeans, AGNES, and Fuzzy C-means, to uncover hidden interests of teenage consumers from SNS data. Through rigorous evaluation of these clustering approaches by default parameters, we identify the optimal number of clusters and group consumers with similar tastes effectively. Our findings provide actionable insights into business impact and critical patterns driving future marketing growth. In our experiment, we systematically evaluate various clustering techniques, and notably, the Kmeans cluster outperforms others, demonstrating strong segmentation ability in the digital market. Specifically, it achieves silhouette scores of 63.90% and 58.06% for 2 and 3 clusters, respectively, highlighting its effectiveness in segmenting the digital market.
arXiv (Cornell University), Mar 17, 2024
In today's digital age, our dependence on IoT (Internet of Things) and IIoT (Industrial IoT) syst... more In today's digital age, our dependence on IoT (Internet of Things) and IIoT (Industrial IoT) systems has grown immensely, which facilitates sensitive activities such as banking transactions and personal, enterprise data, and legal document exchanges. Cyberattackers consistently exploit weak security measures and tools. The Network Intrusion Detection System (IDS) acts as a primary tool against such cyber threats. However, machine learningbased IDSs, when trained on specific attack patterns, often misclassify new emerging cyberattacks. Further, the limited availability of attack instances for training a supervised learner and the ever-evolving nature of cyber threats further complicate the matter. This emphasizes the need for an adaptable IDS framework capable of recognizing and learning from unfamiliar/unseen attacks over time. In this research, we propose a one-class classification-driven IDS system structured on two tiers. The first tier distinguishes between normal activities and attacks/threats, while the second tier determines if the detected attack is known or unknown. Within this second tier, we also embed a multi-classification mechanism coupled with a clustering algorithm. This model not only identifies unseen attacks but also uses them for retraining them by clustering unseen attacks. This enables our model to be futureproofed, capable of evolving with emerging threat patterns. Leveraging one-class classifiers (OCC) at the first level, our approach bypasses the need for attack samples, addressing data imbalance and zero-day attack concerns and OCC at the second level can effectively separate unknown attacks from the known attacks. Our methodology and evaluations indicate that the presented framework exhibits promising potential for real-world deployments.
arXiv (Cornell University), Mar 17, 2024
arXiv (Cornell University), Mar 17, 2024
Brain-related diseases are more sensitive than other diseases due to several factors, including t... more Brain-related diseases are more sensitive than other diseases due to several factors, including the complexity of surgical procedures, high costs, and other challenges. Alzheimer's disease is a common brain disorder that causes memory loss and the shrinking of brain cells. Early detection is critical for providing proper treatment to patients. However, identifying Alzheimer's at an early stage using manual scanning of CT or MRI scans is challenging. Therefore, researchers have delved into the exploration of computer-aided systems, employing Machine Learning and Deep Learning methodologies, which entail the training of datasets to detect Alzheimer's disease. This study aims to present a hybrid model that combines a CNN model's feature extraction capabilities with an LSTM model's detection capabilities. This study has applied the transfer learning called VGG16 in the hybrid model to extract features from MRI images. The LSTM detects features between the convolution layer and the fully connected layer. The output layer of the fully connected layer uses the softmax function. The training of the hybrid model involved utilizing the ADNI dataset. The trial findings revealed that the model achieved a level of accuracy of 98.8%, a sensitivity rate of 100%, and a specificity rate of 76%. The proposed hybrid model outperforms its contemporary CNN counterparts, showcasing a superior performance.
arXiv (Cornell University), Feb 22, 2024
arXiv (Cornell University), 2024
Cybersecurity has emerged as a critical global concern. Intrusion Detection Systems (IDS) play a ... more Cybersecurity has emerged as a critical global concern. Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities. Machine Learning (ML)-based behavior analysis within the IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. However, as the number of data grows, dimension reduction becomes an increasingly difficult task when training ML models. Addressing this, our paper introduces a novel ML-based network intrusion detection model that uses Random Oversampling (RO) to address data imbalance and Stacking Feature Embedding based on clustering results, as well as Principal Component Analysis (PCA) for dimension reduction and is specifically designed for large and imbalanced datasets. This model's performance is carefully evaluated using three cutting-edge benchmark datasets: UNSW-NB15, CIC-IDS-2017, and CIC-IDS-2018. On the UNSW-NB15 dataset, our trials show that the RF and ET models achieve accuracy rates of 99.59% and 99.95%, respectively. Furthermore, using the CIC-IDS2017 dataset, DT, RF, and ET models reach 99.99% accuracy, while DT and RF models obtain 99.94% accuracy on CIC-IDS2018. These performance results continuously outperform the state-of-art, indicating significant progress in the field of network intrusion detection. This achievement demonstrates the efficacy of the suggested methodology, which can be used practically to accurately monitor and identify network traffic intrusions, thereby blocking possible threats.
arXiv (Cornell University), Nov 27, 2023
arXiv (Cornell University), Feb 17, 2024
Leaf diseases pose a significant threat to crop production and profitability, particularly in the... more Leaf diseases pose a significant threat to crop production and profitability, particularly in the case of tomatoes, which are extensively cultivated worldwide. Early detection of these diseases is crucial to prevent significant crop losses. In this study, we propose a novel deep learning-based approach for disease detection and prediction in tomato plant leaves. Convolutional neural networks (CNNs) have emerged as the most effective deep learning algorithm for image classification tasks. Leveraging the power of CNNs, we employed a CNN architecture to detect and identify diseases in tomato leaf samples. The suitability of CNNs for detection and prediction tasks makes them an ideal choice for this study. Our dataset comprised 6,926 images of both healthy and diseased tomato plants obtained from the PlantVillage dataset. By training our CNN model on this dataset, we achieved a promising test accuracy of 98.39\%. This high accuracy demonstrates the effectiveness of our approach in accur...
Intelligent Systems with Applications
Elsevier - International-Journal-of-Cognitive-Computing-in-Engineering, 2021
Diabetes is a very common disease affecting individuals worldwide. Diabetes increases the risk of... more Diabetes is a very common disease affecting individuals worldwide. Diabetes increases the risk of long-term complications including heart disease, and kidney failure among others. People might live longer and lead healthier lives if this disease is detected early. Different supervised machine learning models trained with appropriate datasets can aid in diagnosing the diabetes at the primary stage. The goal of this work is to find effective machinelearning-based classifier models for detecting diabetes in individuals utilizing clinical data. The machine learning algorithms to be trained with several datasets in this article include Decision tree (DT), Naive Bayes (NB), k-nearest neighbor (KNN), Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR) and Support Vector Machine (SVM). We have applied efficient pre-processing techniques including label-encoding and normalization that improve the accuracy of the models. Further, using various feature selection approaches, we have identified and prioritized a number of risk factors. Extensive experiments have been conducted to analyze the performance of the model using two different datasets. Our model is compared with some recent study and the results show that the proposed model can provide better accuracy of 2.71% to 13.13% depending on the dataset and the adopted ML algorithm. Finally, a machine learning algorithm showing the highest accuracy is selected for further development. We integrate this model in a web application using python flask web development framework. The results of this study suggest that an appropriate preprocessing pipeline on clinical data and applying ML-based classification may predict diabetes accurately and efficiently.
Expert Systems With Applications, 2022
Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemica... more Cancer is a fatal disease caused by a combination of genetic diseases and a variety of biochemical abnormalities. Lung and colon cancer have emerged as two of the leading causes of death and disability in humans. The histopathological detection of such malignancies is usually the most important component in determining the best course of action. Early detection of the ailment on either front considerably decreases the likelihood of mortality. Machine learning and deep learning techniques can be utilized to speed up such cancer detection, allowing researchers to study a large number of patients in a much shorter amount of time and at a lower cost. In this research work, we introduced a hybrid ensemble feature extraction model to efficiently identify lung and colon cancer. It integrates deep feature extraction and ensemble learning with high-performance filtering for cancer image datasets. The model is evaluated on histopathological (LC25000) lung and colon datasets. According to the study findings, our hybrid model can detect lung, colon, and (lung and colon) cancer with accuracy rates of 99.05%, 100%, and 99.30%, respectively. The study's findings show that our proposed strategy outperforms existing models significantly. Thus, these models could be applicable in clinics to support the doctor in the diagnosis of cancers.
Journal of Information Security and Applications, 2022
Network intrusion detection systems (NIDSs) play an important role in computer network security. ... more Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms in order to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues.
Computers in Biology and Medicine, Dec 31, 2023
Journal of big data, Feb 22, 2024
International journal of cognitive computing in engineering, Jun 1, 2024
FEBS open bio, May 23, 2024
International journal of cognitive computing in engineering, May 1, 2024
The rapid growth of internet users and social networking sites presents significant challenges fo... more The rapid growth of internet users and social networking sites presents significant challenges for entrepreneurs and marketers. Understanding the evolving behavioral and psychological patterns across consumer demographics is crucial for adapting business models effectively. Particularly, the emergence of new firms targeting adolescents and future generations underscores the importance of comprehending online consumer behavior and communication dynamics. To tackle these challenges, we introduce a Machine Learning-based Digital Native Market Segmentation designed to cater specifically to the interests of digital natives. Leveraging an open-access prototype dataset from social networking sites (SNS), our study employs a variety of clustering techniques, including Kmeans, MiniBatch Kmeans, AGNES, and Fuzzy C-means, to uncover hidden interests of teenage consumers from SNS data. Through rigorous evaluation of these clustering approaches by default parameters, we identify the optimal number of clusters and group consumers with similar tastes effectively. Our findings provide actionable insights into business impact and critical patterns driving future marketing growth. In our experiment, we systematically evaluate various clustering techniques, and notably, the Kmeans cluster outperforms others, demonstrating strong segmentation ability in the digital market. Specifically, it achieves silhouette scores of 63.90% and 58.06% for 2 and 3 clusters, respectively, highlighting its effectiveness in segmenting the digital market.
arXiv (Cornell University), Mar 17, 2024
In today's digital age, our dependence on IoT (Internet of Things) and IIoT (Industrial IoT) syst... more In today's digital age, our dependence on IoT (Internet of Things) and IIoT (Industrial IoT) systems has grown immensely, which facilitates sensitive activities such as banking transactions and personal, enterprise data, and legal document exchanges. Cyberattackers consistently exploit weak security measures and tools. The Network Intrusion Detection System (IDS) acts as a primary tool against such cyber threats. However, machine learningbased IDSs, when trained on specific attack patterns, often misclassify new emerging cyberattacks. Further, the limited availability of attack instances for training a supervised learner and the ever-evolving nature of cyber threats further complicate the matter. This emphasizes the need for an adaptable IDS framework capable of recognizing and learning from unfamiliar/unseen attacks over time. In this research, we propose a one-class classification-driven IDS system structured on two tiers. The first tier distinguishes between normal activities and attacks/threats, while the second tier determines if the detected attack is known or unknown. Within this second tier, we also embed a multi-classification mechanism coupled with a clustering algorithm. This model not only identifies unseen attacks but also uses them for retraining them by clustering unseen attacks. This enables our model to be futureproofed, capable of evolving with emerging threat patterns. Leveraging one-class classifiers (OCC) at the first level, our approach bypasses the need for attack samples, addressing data imbalance and zero-day attack concerns and OCC at the second level can effectively separate unknown attacks from the known attacks. Our methodology and evaluations indicate that the presented framework exhibits promising potential for real-world deployments.
arXiv (Cornell University), Mar 17, 2024
arXiv (Cornell University), Mar 17, 2024
Brain-related diseases are more sensitive than other diseases due to several factors, including t... more Brain-related diseases are more sensitive than other diseases due to several factors, including the complexity of surgical procedures, high costs, and other challenges. Alzheimer's disease is a common brain disorder that causes memory loss and the shrinking of brain cells. Early detection is critical for providing proper treatment to patients. However, identifying Alzheimer's at an early stage using manual scanning of CT or MRI scans is challenging. Therefore, researchers have delved into the exploration of computer-aided systems, employing Machine Learning and Deep Learning methodologies, which entail the training of datasets to detect Alzheimer's disease. This study aims to present a hybrid model that combines a CNN model's feature extraction capabilities with an LSTM model's detection capabilities. This study has applied the transfer learning called VGG16 in the hybrid model to extract features from MRI images. The LSTM detects features between the convolution layer and the fully connected layer. The output layer of the fully connected layer uses the softmax function. The training of the hybrid model involved utilizing the ADNI dataset. The trial findings revealed that the model achieved a level of accuracy of 98.8%, a sensitivity rate of 100%, and a specificity rate of 76%. The proposed hybrid model outperforms its contemporary CNN counterparts, showcasing a superior performance.
arXiv (Cornell University), Feb 22, 2024
arXiv (Cornell University), 2024
Cybersecurity has emerged as a critical global concern. Intrusion Detection Systems (IDS) play a ... more Cybersecurity has emerged as a critical global concern. Intrusion Detection Systems (IDS) play a critical role in protecting interconnected networks by detecting malicious actors and activities. Machine Learning (ML)-based behavior analysis within the IDS has considerable potential for detecting dynamic cyber threats, identifying abnormalities, and identifying malicious conduct within the network. However, as the number of data grows, dimension reduction becomes an increasingly difficult task when training ML models. Addressing this, our paper introduces a novel ML-based network intrusion detection model that uses Random Oversampling (RO) to address data imbalance and Stacking Feature Embedding based on clustering results, as well as Principal Component Analysis (PCA) for dimension reduction and is specifically designed for large and imbalanced datasets. This model's performance is carefully evaluated using three cutting-edge benchmark datasets: UNSW-NB15, CIC-IDS-2017, and CIC-IDS-2018. On the UNSW-NB15 dataset, our trials show that the RF and ET models achieve accuracy rates of 99.59% and 99.95%, respectively. Furthermore, using the CIC-IDS2017 dataset, DT, RF, and ET models reach 99.99% accuracy, while DT and RF models obtain 99.94% accuracy on CIC-IDS2018. These performance results continuously outperform the state-of-art, indicating significant progress in the field of network intrusion detection. This achievement demonstrates the efficacy of the suggested methodology, which can be used practically to accurately monitor and identify network traffic intrusions, thereby blocking possible threats.
arXiv (Cornell University), Nov 27, 2023
arXiv (Cornell University), Feb 17, 2024
Leaf diseases pose a significant threat to crop production and profitability, particularly in the... more Leaf diseases pose a significant threat to crop production and profitability, particularly in the case of tomatoes, which are extensively cultivated worldwide. Early detection of these diseases is crucial to prevent significant crop losses. In this study, we propose a novel deep learning-based approach for disease detection and prediction in tomato plant leaves. Convolutional neural networks (CNNs) have emerged as the most effective deep learning algorithm for image classification tasks. Leveraging the power of CNNs, we employed a CNN architecture to detect and identify diseases in tomato leaf samples. The suitability of CNNs for detection and prediction tasks makes them an ideal choice for this study. Our dataset comprised 6,926 images of both healthy and diseased tomato plants obtained from the PlantVillage dataset. By training our CNN model on this dataset, we achieved a promising test accuracy of 98.39\%. This high accuracy demonstrates the effectiveness of our approach in accur...
Intelligent Systems with Applications
Genes
Biomarker-based cancer identification and classification tools are widely used in bioinformatics ... more Biomarker-based cancer identification and classification tools are widely used in bioinformatics and machine learning fields. However, the high dimensionality of microarray gene expression data poses a challenge for identifying important genes in cancer diagnosis. Many feature selection algorithms optimize cancer diagnosis by selecting optimal features. This article proposes an ensemble rank-based feature selection method (EFSM) and an ensemble weighted average voting classifier (VT) to overcome this challenge. The EFSM uses a ranking method that aggregates features from individual selection methods to efficiently discover the most relevant and useful features. The VT combines support vector machine, k-nearest neighbor, and decision tree algorithms to create an ensemble model. The proposed method was tested on three benchmark datasets and compared to existing built-in ensemble models. The results show that our model achieved higher accuracy, with 100% for leukaemia, 94.74% for colon...