Usman Naseem - Academia.edu (original) (raw)

Conference Presentations by Usman Naseem

Research paper thumbnail of Deep AutoEncoder-Decoder Framework for Semantic Segmentation of Brain Tumor

Deep AutoEncoder-Decoder Framework for Semantic Segmentation of Brain Tumor, 2019

Accurate segmentation of brain tumor is a critical component for diagnosis of cancer, treatment a... more Accurate segmentation of brain tumor is a critical component for diagnosis of cancer, treatment and evaluation of outcome. It consist of identification of different types of tumor tissues from normal brain MRI images. Recently, pathway CNNs have been used for semantic segmentation, however are computationally expensive. Build upon success of SegNet, in this paper, we presented different architectures of SegNeT encoder and decoder based on pixel-wise classification. Nonlinear up sampling are performed by the model. The end to end training and small number of parameters used for the training makes the computational process more higher than other deep learning architectures. We performed the semantic segmentation on the MRI brain tumor Figshare-dataset and achieved the state of the arts results (99.93% global accuracy) in comparison to traditional CNN models.

Papers by Usman Naseem

Research paper thumbnail of Text Mining of Stocktwits Data for Predicting Stock Prices

Applied system innovation, Feb 17, 2021

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

Research paper thumbnail of BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

Background: In recent years, with the growing amount of biomedical documents, coupled with advanc... more Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining)-bioALBERT-an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89% (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Research paper thumbnail of Incorporating Embedding to Topic Modeling for More Effective Short Text Analysis

Companion Proceedings of the ACM Web Conference 2023

Research paper thumbnail of Graph-Based Hierarchical Attention Network for Suicide Risk Detection on Social Media

Companion Proceedings of the ACM Web Conference 2023

Research paper thumbnail of Coherent Topic Modeling for Creative Multimodal Data on Social Media

Proceedings of the ACM Web Conference 2023

Research paper thumbnail of Effective Classification of Synovial Sarcoma Cancer Using Structure Features and Support Vectors

Computers, Materials & Continua

In this research work, we proposed a medical image analysis framework with two separate releases ... more In this research work, we proposed a medical image analysis framework with two separate releases whether or not Synovial Sarcoma (SS) is the cell structure for cancer. Within this framework the histopathology images are decomposed into a third-level sub-band using a two-dimensional Discrete Wavelet Transform. Subsequently, the structure features (SFs) such as Principal Components Analysis (PCA), Independent Components Analysis (ICA) and Linear Discriminant Analysis (LDA) were extracted from this subband image representation with the distribution of wavelet coefficients. These SFs are used as inputs of the Support Vector Machine (SVM) classifier. Also, classification of PCA + SVM, ICA + SVM, and LDA + SVM with Radial Basis Function (RBF) kernel the efficiency of the process is differentiated and compared with the best classification results. Furthermore, data collected on the internet from various histopathological centres via the Internet of Things (IoT) are stored and shared on blockchain technology across a wide range of image distribution across secure data IoT devices. Due to this, the minimum and maximum values of the kernel parameter are adjusted and updated periodically for the purpose of industrial application in device calibration. Consequently, these resolutions are presented with an excellent example of a technique for training and testing the cancer cell structure prognosis methods in spindle shaped cell (SSC) histopathological imaging databases. The performance characteristics of cross-validation are evaluated with the help of the receiver operating characteristics (ROC) curve, and significant differences in classification performance between the techniques are analyzed. The combination of LDA + SVM technique has been proven to be essential for intelligent SS cancer detection in the future, and it offers excellent classification accuracy, sensitivity, specificity.

Research paper thumbnail of Vision-Language Transformer for Interpretable Pathology Visual Question Answering

IEEE Journal of Biomedical and Health Informatics

Pathology visual question answering (PathVQA) attempts to correctly answer medical questions pres... more Pathology visual question answering (PathVQA) attempts to correctly answer medical questions presented with pathology images. Despite its great prospective in healthcare, the technology is still in its early stages with low overall accuracy. This is because it requires both high and low-level interactions on both the image (vision) and question (language) to generate an answer. Existing methods focused on treating vision and language features independently, which cannot capture these high and low-level interactions. Further, these methods failed to interpret retrieved answers, which are obscure to humans. Models interpretability to justify the retrieved answers has remained largely unexplored and has become important to engender users trust in the retrieved answer by providing insight into the model prediction. Motivated by these gaps, we introduce an interpretable transformer-based Path-VQA (TraP-VQA), where we embed transformers' encoder layers with vision (images) features extracted using CNN and language (questions) features extracted using CNNs and domain-specific language model (LM). A decoder layer of the transformer is then embedded to upsample the encoded features for the final prediction for PathVQA. Our experiments showed that our TraP-VQA outperformed state-of-the-art comparative methods with the public PathVQA dataset. Further, our ablation study presents the capability of each component of our transformer-based vision-language model. Finally, we demonstrate the interpretability of Trap-VQA by presenting the visualization results of both text and images used to explain the reason for a retrieved answer in the PathVQA.

Research paper thumbnail of Heart Disease Diagnosis Using the Brute Force Algorithm and Machine Learning Techniques

Computers, Materials & Continua, 2022

Heart disease is one of the leading causes of death in the world today. Prediction of heart disea... more Heart disease is one of the leading causes of death in the world today. Prediction of heart disease is a prominent topic in the clinical data processing. To increase patient survival rates, early diagnosis of heart disease is an important field of research in the medical field. There are many studies on the prediction of heart disease, but limited work is done on the selection of features. The selection of features is one of the best techniques for the diagnosis of heart diseases. In this research paper, we find optimal features using the brute-force algorithm, and machine learning techniques are used to improve the accuracy of heart disease prediction. For performance evaluation, accuracy, sensitivity, and specificity are used with split and cross-validation techniques. The results of the proposed technique are evaluated in three different heart disease datasets with a different number of records, and the proposed technique is found to have superior performance. The selection of optimized features generated by the brute force algorithm is used as input to machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF), K Nearest Neighbor (KNN), and Naive Bayes (NB). The proposed technique achieved 97% accuracy with Naive Bayes through split validation and 95% accuracy with Random Forest through cross-validation. Naive Bayes and Random Forest are found to outperform other classification approaches when accurately evaluated. The results of the proposed technique are compared with the results of the existing study, and the results of the proposed technique are found to be better than other state-of-the-art methods. Therefore, our proposed approach plays an important role in the selection of important features and the automatic detection of heart disease.

Research paper thumbnail of A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Research paper thumbnail of A novel multiple kernel fuzzy topic modeling technique for biomedical data

BMC Bioinformatics

Background Text mining in the biomedical field has received much attention and regarded as the im... more Background Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. Methods In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. Res...

Research paper thumbnail of A New Method for Centrality Measurement Using Generalized Fuzzy Graphs

Discrete Dynamics in Nature and Society

The fuzzy graph is the foundation of many actual structures such as networking, picture planning,... more The fuzzy graph is the foundation of many actual structures such as networking, picture planning, and so on. Generalized fuzzy graphs (GFG) are ideal for avoiding certain constraints of fuzzy graphs. Social networks are a useful and powerful way to link citizens worldwide. A central person in a social network is to deny the leading people in it. Different centrality steps have also been established over the years. We evaluated the centrality of the social network through a general fuzzy graph in this article. An application to detect the central person in any online group like WhatsApp is described in this study by using generalized fuzzy graphs. Also, a few important results are established in this study.

Research paper thumbnail of An Automatic Detection of Breast Cancer Diagnosis and Prognosis Based on Machine Learning Using Ensemble of Classifiers

IEEE Access

Breast cancer (BC) is the second most prevalent type of cancer among women leading to death, and ... more Breast cancer (BC) is the second most prevalent type of cancer among women leading to death, and its rate of mortality is very high. Its effects will be reduced if diagnosed early. BC's early detection will greatly boost the prognosis and likelihood of recovery, as it may encourage prompt surgical care for patients. It is therefore vital to have a system enabling the healthcare industry to detect breast cancer quickly and accurately. Machine learning (ML) is widely used in breast cancer (BC) pattern classification due to its advantages in modelling a critical feature detection from complex BC datasets. In this paper, we propose a system for automatic detection of BC diagnosis and prognosis using ensemble of classifiers. First, we review various machine learning (ML) algorithms and ensemble of different ML algorithms. We present an overview of ML algorithms including ANN, and ensemble of different classifiers for automatic BC diagnosis and prognosis detection. We also present and compare various ensemble models and other variants of tested ML based models with and without up-sampling technique on two benchmark datasets. We also studied the effects of using balanced class weight on prognosis dataset and compared its performance with others. The results showed that the ensemble method outperformed other state-of-the-art methods and achieved 98.83% accuracy. Because of high performance, the proposed system is of great importance to the medical industry and relevant research community. The comparison shows that the proposed method outperformed other state-of-the-art methods. INDEX TERMS Healthcare system, machine learning, breast cancer, ensemble learning, cancer diagnoses. I. INTRODUCTION Breast cancer is one of the most dangerous and prevalent cancers among women, causing the deaths of large numbers of women worldwide. Breast cancer accounts for 8.4% of diagnosed cancers and 6.6% of cancer-related deaths worldwide, according to a World Health Organization (WHO) report [1]. Breast cancer accounted for 15.9% of all reported cancers The associate editor coordinating the review of this manuscript and approving it for publication was Ahmed Farouk .

Research paper thumbnail of A Proposed Framework for the Security of Financial Systems

Indian Journal of Science and Technology, 2019

Objectives: This study proposes a framework for the enhancement the security level of the eBankin... more Objectives: This study proposes a framework for the enhancement the security level of the eBanking or online banking systems. Methods/Statistical Analysis: Financial system has been categorically considered as a major critical infrastructure of community, society and country. In addition, the constantly rising number of breaching attacks increased and targeted via Internet. It is therefore recommended to provide more security to such systems from malicious activities. In this regard, this study overviews comprehensive highlighted security challenges, security attacks. In addition, a security framework is also proposed to enhance the security level of discussed systems. Findings: The proposed framework of the system is categorized into two: The first one category discusses Network Standards for the Security of System and the second category is based on architecture of the system. In Network Standards for the Security of System, two standards are defined, one for wired connection while others for wireless connections. The second part is most significant section of the paper, which is comprised of three stages. Stage one comprises basic Security requirements, Stage two defines parameters of Wireless Security design and the third stage deploy security algorithms and techniques to offer secure network.

Research paper thumbnail of RHMD: A Real-World Dataset for Health Mention Classification on Reddit

IEEE Transactions on Computational Social Systems

Research paper thumbnail of Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

A user-generated text on social media enables health workers to keep track of information, identi... more A user-generated text on social media enables health workers to keep track of information, identify possible outbreaks, forecast disease trends, monitor emergency cases, and ascertain disease awareness and response to official health correspondence. This exchange of health information on social media has been regarded as an attempt to enhance public health surveillance (PHS). Despite its potential, the technology is still in its early stages and is not ready for widespread application. Advancements in pretrained language models (PLMs) have facilitated the development of several domain-specific PLMs and a variety of downstream applications. However, there are no PLMs for social media tasks involving PHS. We present and release PHS-BERT, a transformer-based PLM, to identify tasks related to public health surveillance on social media. We compared and benchmarked the performance of PHS-BERT on 25 datasets from different social medial platforms related to 7 different PHS tasks. Compared with existing PLMs that are mainly evaluated on limited tasks, PHS-BERT achieved state-ofthe-art performance on all 25 tested datasets, showing that our PLM is robust and generalizable in the common PHS tasks. By making PHS-BERT available 1 , we aim to facilitate the community to reduce the computational cost and introduce new baselines for future works across various PHS-related tasks.

Research paper thumbnail of Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation

Proceedings of the 21st Workshop on Biomedical Language Processing

Research paper thumbnail of Identification of Disease or Symptom terms in Reddit to Improve Health Mention Classification

Proceedings of the ACM Web Conference 2022

In a user-generated text such as on social media platforms and online forums, people often use di... more In a user-generated text such as on social media platforms and online forums, people often use disease or symptom terms in ways other than to describe their health. In data-driven public health surveillance, the health mention classification (HMC) task aims to identify posts where users are discussing health conditions rather than using disease and symptom terms for other reasons. Existing computational research typically only studies health mentions in Twitter, with limited coverage of disease or symptom terms, ignore user behavior information, and other ways people use disease or symptom terms. To advance the HMC research, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD consists of 10,015 manually labeled Reddit posts that mention 15 common disease or symptom terms and are annotated with four labels: namely personal health mentions, non-personal health mentions, figurative health mentions, and hyperbolic health mentions. With RHMD, we propose HMC-NET that combines a target keyword (disease or symptom term) identification and user behavior hierarchically to improve HMC. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods with an F1-Score of 0.75 (an increase of 11% over the state-of-the-art) and shows that our new dataset poses a strong challenge to the existing HMC methods. CCS CONCEPTS • Applied computing → Health informatics; • Computing methodologies → Natural language processing.

Research paper thumbnail of Automatic COVID-19 Lung Infection Segmentation through Modified Unet Model

Journal of Healthcare Engineering

The coronavirus (COVID-19) pandemic has had a terrible impact on human lives globally, with far-r... more The coronavirus (COVID-19) pandemic has had a terrible impact on human lives globally, with far-reaching consequences for the health and well-being of many people around the world. Statistically, 305.9 million people worldwide tested positive for COVID-19, and 5.48 million people died due to COVID-19 up to 10 January 2022. CT scans can be used as an alternative to time-consuming RT-PCR testing for COVID-19. This research work proposes a segmentation approach to identifying ground glass opacity or ROI in CT images developed by coronavirus, with a modified structure of the Unet model having been used to classify the region of interest at the pixel level. The problem with segmentation is that the GGO often appears indistinguishable from a healthy lung in the initial stages of COVID-19, and so, to cope with this, the increased set of weights in contracting and expanding the Unet path and an improved convolutional module is added in order to establish the connection between the encoder a...

Research paper thumbnail of Hybrid Words Representation for the classification of low quality text

University of Technology, Sydney, 2020

Acknowledgment goes here Foremost, I would like to express my sincere gratitude to my supervisor;... more Acknowledgment goes here Foremost, I would like to express my sincere gratitude to my supervisor; Professor Longbing Cao for the continuous support of my master research degree, for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor during my study. I also would like to appreciate my co-supervisor A\Professor Kaska Musial-Gabrys for providing me with continuous support throughout my study and research. Without her professional guidance and persistent help, this thesis would not have been possible. Each of my supervisors helped, supported, and guided me through my research at UTS exceptionally and unforgettably. I shall always remain thankful to my supervisors. I thank my fellow lab-mates in Advanced Analytics Institute (AAi): for the stimulating discussions, guidance, support and for all the fun we have had in the last two years. I am thankful to the office staff at the School of computer science, UTS, in particular, Margot, Janet, Teraesa and Reshma. All team members I met within the School, provided a conducive research environment. I am also thankful to the staff at the Graduate Research School at UTS, who always remained welcoming of my questions and queries, resolving my problems in a timely manner. Last but not least, I would like to thank my family for their unconditional support, both financially and emotionally throughout the whole master studying. Dedication My thesis is dedicated to the people who have supported my goals, inspired me and challenged me academically to make it to this day. Reflecting on my path that led me to this day, after spending more than eight years in the industry and having a very comfortable life with a good job, it was very hard decision for me to come back to student life with limited income. But I decided to take this challenge and thought may be I can do it. I pushed myself hard throughout this period. Today, I submit my master by research degree at the UTS and change my earlier 'maybe' into 'yes'. For this day today, first and foremost credit goes to my parents. My father taught me to make decisions and always encouraged and supported my bold decision. In all bad times, he encouraged and supported me. I wish I could share this moment with my mother (late). Whatever I am today is only because of her prayers. Thanks to my siblings, Salman and Haadia for always remaining a great source of support and encouragement. My grandfather (late) who always prayed for my success. I still remember that I used to call my grandfather and asked him to pray for me whenever I used to feel down. Without the support of my wife Huda, I would have never been able to complete my master degree. Aashir and Mahd, I owe you a lot. I could not give more time to you in your childhood due to coming back to home late nights during the weekdays and spending most of my weekends in the library doing my research-you were the real owners of this time. Huda, I am eternally thankful to you for taking up all the responsibilities and letting me focus on my research. Thank you. Thank you to my father-in-law and mother-in-law and as well as siblings of my wife who offered well wishes for my studies and prayed for my success

Research paper thumbnail of Deep AutoEncoder-Decoder Framework for Semantic Segmentation of Brain Tumor

Deep AutoEncoder-Decoder Framework for Semantic Segmentation of Brain Tumor, 2019

Accurate segmentation of brain tumor is a critical component for diagnosis of cancer, treatment a... more Accurate segmentation of brain tumor is a critical component for diagnosis of cancer, treatment and evaluation of outcome. It consist of identification of different types of tumor tissues from normal brain MRI images. Recently, pathway CNNs have been used for semantic segmentation, however are computationally expensive. Build upon success of SegNet, in this paper, we presented different architectures of SegNeT encoder and decoder based on pixel-wise classification. Nonlinear up sampling are performed by the model. The end to end training and small number of parameters used for the training makes the computational process more higher than other deep learning architectures. We performed the semantic segmentation on the MRI brain tumor Figshare-dataset and achieved the state of the arts results (99.93% global accuracy) in comparison to traditional CNN models.

Research paper thumbnail of Text Mining of Stocktwits Data for Predicting Stock Prices

Applied system innovation, Feb 17, 2021

This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY

Research paper thumbnail of BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

Background: In recent years, with the growing amount of biomedical documents, coupled with advanc... more Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining)-bioALBERT-an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89% (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Research paper thumbnail of Incorporating Embedding to Topic Modeling for More Effective Short Text Analysis

Companion Proceedings of the ACM Web Conference 2023

Research paper thumbnail of Graph-Based Hierarchical Attention Network for Suicide Risk Detection on Social Media

Companion Proceedings of the ACM Web Conference 2023

Research paper thumbnail of Coherent Topic Modeling for Creative Multimodal Data on Social Media

Proceedings of the ACM Web Conference 2023

Research paper thumbnail of Effective Classification of Synovial Sarcoma Cancer Using Structure Features and Support Vectors

Computers, Materials & Continua

In this research work, we proposed a medical image analysis framework with two separate releases ... more In this research work, we proposed a medical image analysis framework with two separate releases whether or not Synovial Sarcoma (SS) is the cell structure for cancer. Within this framework the histopathology images are decomposed into a third-level sub-band using a two-dimensional Discrete Wavelet Transform. Subsequently, the structure features (SFs) such as Principal Components Analysis (PCA), Independent Components Analysis (ICA) and Linear Discriminant Analysis (LDA) were extracted from this subband image representation with the distribution of wavelet coefficients. These SFs are used as inputs of the Support Vector Machine (SVM) classifier. Also, classification of PCA + SVM, ICA + SVM, and LDA + SVM with Radial Basis Function (RBF) kernel the efficiency of the process is differentiated and compared with the best classification results. Furthermore, data collected on the internet from various histopathological centres via the Internet of Things (IoT) are stored and shared on blockchain technology across a wide range of image distribution across secure data IoT devices. Due to this, the minimum and maximum values of the kernel parameter are adjusted and updated periodically for the purpose of industrial application in device calibration. Consequently, these resolutions are presented with an excellent example of a technique for training and testing the cancer cell structure prognosis methods in spindle shaped cell (SSC) histopathological imaging databases. The performance characteristics of cross-validation are evaluated with the help of the receiver operating characteristics (ROC) curve, and significant differences in classification performance between the techniques are analyzed. The combination of LDA + SVM technique has been proven to be essential for intelligent SS cancer detection in the future, and it offers excellent classification accuracy, sensitivity, specificity.

Research paper thumbnail of Vision-Language Transformer for Interpretable Pathology Visual Question Answering

IEEE Journal of Biomedical and Health Informatics

Pathology visual question answering (PathVQA) attempts to correctly answer medical questions pres... more Pathology visual question answering (PathVQA) attempts to correctly answer medical questions presented with pathology images. Despite its great prospective in healthcare, the technology is still in its early stages with low overall accuracy. This is because it requires both high and low-level interactions on both the image (vision) and question (language) to generate an answer. Existing methods focused on treating vision and language features independently, which cannot capture these high and low-level interactions. Further, these methods failed to interpret retrieved answers, which are obscure to humans. Models interpretability to justify the retrieved answers has remained largely unexplored and has become important to engender users trust in the retrieved answer by providing insight into the model prediction. Motivated by these gaps, we introduce an interpretable transformer-based Path-VQA (TraP-VQA), where we embed transformers' encoder layers with vision (images) features extracted using CNN and language (questions) features extracted using CNNs and domain-specific language model (LM). A decoder layer of the transformer is then embedded to upsample the encoded features for the final prediction for PathVQA. Our experiments showed that our TraP-VQA outperformed state-of-the-art comparative methods with the public PathVQA dataset. Further, our ablation study presents the capability of each component of our transformer-based vision-language model. Finally, we demonstrate the interpretability of Trap-VQA by presenting the visualization results of both text and images used to explain the reason for a retrieved answer in the PathVQA.

Research paper thumbnail of Heart Disease Diagnosis Using the Brute Force Algorithm and Machine Learning Techniques

Computers, Materials & Continua, 2022

Heart disease is one of the leading causes of death in the world today. Prediction of heart disea... more Heart disease is one of the leading causes of death in the world today. Prediction of heart disease is a prominent topic in the clinical data processing. To increase patient survival rates, early diagnosis of heart disease is an important field of research in the medical field. There are many studies on the prediction of heart disease, but limited work is done on the selection of features. The selection of features is one of the best techniques for the diagnosis of heart diseases. In this research paper, we find optimal features using the brute-force algorithm, and machine learning techniques are used to improve the accuracy of heart disease prediction. For performance evaluation, accuracy, sensitivity, and specificity are used with split and cross-validation techniques. The results of the proposed technique are evaluated in three different heart disease datasets with a different number of records, and the proposed technique is found to have superior performance. The selection of optimized features generated by the brute force algorithm is used as input to machine learning algorithms such as Support Vector Machine (SVM), Random Forest (RF), K Nearest Neighbor (KNN), and Naive Bayes (NB). The proposed technique achieved 97% accuracy with Naive Bayes through split validation and 95% accuracy with Random Forest through cross-validation. Naive Bayes and Random Forest are found to outperform other classification approaches when accurately evaluated. The results of the proposed technique are compared with the results of the existing study, and the results of the proposed technique are found to be better than other state-of-the-art methods. Therefore, our proposed approach plays an important role in the selection of important features and the automatic detection of heart disease.

Research paper thumbnail of A Multimodal Framework for the Identification of Vaccine Critical Memes on Twitter

Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining

Research paper thumbnail of A novel multiple kernel fuzzy topic modeling technique for biomedical data

BMC Bioinformatics

Background Text mining in the biomedical field has received much attention and regarded as the im... more Background Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. Methods In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. Res...

Research paper thumbnail of A New Method for Centrality Measurement Using Generalized Fuzzy Graphs

Discrete Dynamics in Nature and Society

The fuzzy graph is the foundation of many actual structures such as networking, picture planning,... more The fuzzy graph is the foundation of many actual structures such as networking, picture planning, and so on. Generalized fuzzy graphs (GFG) are ideal for avoiding certain constraints of fuzzy graphs. Social networks are a useful and powerful way to link citizens worldwide. A central person in a social network is to deny the leading people in it. Different centrality steps have also been established over the years. We evaluated the centrality of the social network through a general fuzzy graph in this article. An application to detect the central person in any online group like WhatsApp is described in this study by using generalized fuzzy graphs. Also, a few important results are established in this study.

Research paper thumbnail of An Automatic Detection of Breast Cancer Diagnosis and Prognosis Based on Machine Learning Using Ensemble of Classifiers

IEEE Access

Breast cancer (BC) is the second most prevalent type of cancer among women leading to death, and ... more Breast cancer (BC) is the second most prevalent type of cancer among women leading to death, and its rate of mortality is very high. Its effects will be reduced if diagnosed early. BC's early detection will greatly boost the prognosis and likelihood of recovery, as it may encourage prompt surgical care for patients. It is therefore vital to have a system enabling the healthcare industry to detect breast cancer quickly and accurately. Machine learning (ML) is widely used in breast cancer (BC) pattern classification due to its advantages in modelling a critical feature detection from complex BC datasets. In this paper, we propose a system for automatic detection of BC diagnosis and prognosis using ensemble of classifiers. First, we review various machine learning (ML) algorithms and ensemble of different ML algorithms. We present an overview of ML algorithms including ANN, and ensemble of different classifiers for automatic BC diagnosis and prognosis detection. We also present and compare various ensemble models and other variants of tested ML based models with and without up-sampling technique on two benchmark datasets. We also studied the effects of using balanced class weight on prognosis dataset and compared its performance with others. The results showed that the ensemble method outperformed other state-of-the-art methods and achieved 98.83% accuracy. Because of high performance, the proposed system is of great importance to the medical industry and relevant research community. The comparison shows that the proposed method outperformed other state-of-the-art methods. INDEX TERMS Healthcare system, machine learning, breast cancer, ensemble learning, cancer diagnoses. I. INTRODUCTION Breast cancer is one of the most dangerous and prevalent cancers among women, causing the deaths of large numbers of women worldwide. Breast cancer accounts for 8.4% of diagnosed cancers and 6.6% of cancer-related deaths worldwide, according to a World Health Organization (WHO) report [1]. Breast cancer accounted for 15.9% of all reported cancers The associate editor coordinating the review of this manuscript and approving it for publication was Ahmed Farouk .

Research paper thumbnail of A Proposed Framework for the Security of Financial Systems

Indian Journal of Science and Technology, 2019

Objectives: This study proposes a framework for the enhancement the security level of the eBankin... more Objectives: This study proposes a framework for the enhancement the security level of the eBanking or online banking systems. Methods/Statistical Analysis: Financial system has been categorically considered as a major critical infrastructure of community, society and country. In addition, the constantly rising number of breaching attacks increased and targeted via Internet. It is therefore recommended to provide more security to such systems from malicious activities. In this regard, this study overviews comprehensive highlighted security challenges, security attacks. In addition, a security framework is also proposed to enhance the security level of discussed systems. Findings: The proposed framework of the system is categorized into two: The first one category discusses Network Standards for the Security of System and the second category is based on architecture of the system. In Network Standards for the Security of System, two standards are defined, one for wired connection while others for wireless connections. The second part is most significant section of the paper, which is comprised of three stages. Stage one comprises basic Security requirements, Stage two defines parameters of Wireless Security design and the third stage deploy security algorithms and techniques to offer secure network.

Research paper thumbnail of RHMD: A Real-World Dataset for Health Mention Classification on Reddit

IEEE Transactions on Computational Social Systems

Research paper thumbnail of Benchmarking for Public Health Surveillance tasks on Social Media with a Domain-Specific Pretrained Language Model

Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP

A user-generated text on social media enables health workers to keep track of information, identi... more A user-generated text on social media enables health workers to keep track of information, identify possible outbreaks, forecast disease trends, monitor emergency cases, and ascertain disease awareness and response to official health correspondence. This exchange of health information on social media has been regarded as an attempt to enhance public health surveillance (PHS). Despite its potential, the technology is still in its early stages and is not ready for widespread application. Advancements in pretrained language models (PLMs) have facilitated the development of several domain-specific PLMs and a variety of downstream applications. However, there are no PLMs for social media tasks involving PHS. We present and release PHS-BERT, a transformer-based PLM, to identify tasks related to public health surveillance on social media. We compared and benchmarked the performance of PHS-BERT on 25 datasets from different social medial platforms related to 7 different PHS tasks. Compared with existing PLMs that are mainly evaluated on limited tasks, PHS-BERT achieved state-ofthe-art performance on all 25 tested datasets, showing that our PLM is robust and generalizable in the common PHS tasks. By making PHS-BERT available 1 , we aim to facilitate the community to reduce the computational cost and introduce new baselines for future works across various PHS-related tasks.

Research paper thumbnail of Incorporating Medical Knowledge to Transformer-based Language Models for Medical Dialogue Generation

Proceedings of the 21st Workshop on Biomedical Language Processing

Research paper thumbnail of Identification of Disease or Symptom terms in Reddit to Improve Health Mention Classification

Proceedings of the ACM Web Conference 2022

In a user-generated text such as on social media platforms and online forums, people often use di... more In a user-generated text such as on social media platforms and online forums, people often use disease or symptom terms in ways other than to describe their health. In data-driven public health surveillance, the health mention classification (HMC) task aims to identify posts where users are discussing health conditions rather than using disease and symptom terms for other reasons. Existing computational research typically only studies health mentions in Twitter, with limited coverage of disease or symptom terms, ignore user behavior information, and other ways people use disease or symptom terms. To advance the HMC research, we present a Reddit health mention dataset (RHMD), a new dataset of multi-domain Reddit data for the HMC. RHMD consists of 10,015 manually labeled Reddit posts that mention 15 common disease or symptom terms and are annotated with four labels: namely personal health mentions, non-personal health mentions, figurative health mentions, and hyperbolic health mentions. With RHMD, we propose HMC-NET that combines a target keyword (disease or symptom term) identification and user behavior hierarchically to improve HMC. Experimental results demonstrate that the proposed approach outperforms state-of-the-art methods with an F1-Score of 0.75 (an increase of 11% over the state-of-the-art) and shows that our new dataset poses a strong challenge to the existing HMC methods. CCS CONCEPTS • Applied computing → Health informatics; • Computing methodologies → Natural language processing.

Research paper thumbnail of Automatic COVID-19 Lung Infection Segmentation through Modified Unet Model

Journal of Healthcare Engineering

The coronavirus (COVID-19) pandemic has had a terrible impact on human lives globally, with far-r... more The coronavirus (COVID-19) pandemic has had a terrible impact on human lives globally, with far-reaching consequences for the health and well-being of many people around the world. Statistically, 305.9 million people worldwide tested positive for COVID-19, and 5.48 million people died due to COVID-19 up to 10 January 2022. CT scans can be used as an alternative to time-consuming RT-PCR testing for COVID-19. This research work proposes a segmentation approach to identifying ground glass opacity or ROI in CT images developed by coronavirus, with a modified structure of the Unet model having been used to classify the region of interest at the pixel level. The problem with segmentation is that the GGO often appears indistinguishable from a healthy lung in the initial stages of COVID-19, and so, to cope with this, the increased set of weights in contracting and expanding the Unet path and an improved convolutional module is added in order to establish the connection between the encoder a...

Research paper thumbnail of Hybrid Words Representation for the classification of low quality text

University of Technology, Sydney, 2020

Acknowledgment goes here Foremost, I would like to express my sincere gratitude to my supervisor;... more Acknowledgment goes here Foremost, I would like to express my sincere gratitude to my supervisor; Professor Longbing Cao for the continuous support of my master research degree, for his patience, motivation, enthusiasm, and immense knowledge. His guidance helped me in all the time of research and writing of this thesis. I could not have imagined having a better advisor and mentor during my study. I also would like to appreciate my co-supervisor A\Professor Kaska Musial-Gabrys for providing me with continuous support throughout my study and research. Without her professional guidance and persistent help, this thesis would not have been possible. Each of my supervisors helped, supported, and guided me through my research at UTS exceptionally and unforgettably. I shall always remain thankful to my supervisors. I thank my fellow lab-mates in Advanced Analytics Institute (AAi): for the stimulating discussions, guidance, support and for all the fun we have had in the last two years. I am thankful to the office staff at the School of computer science, UTS, in particular, Margot, Janet, Teraesa and Reshma. All team members I met within the School, provided a conducive research environment. I am also thankful to the staff at the Graduate Research School at UTS, who always remained welcoming of my questions and queries, resolving my problems in a timely manner. Last but not least, I would like to thank my family for their unconditional support, both financially and emotionally throughout the whole master studying. Dedication My thesis is dedicated to the people who have supported my goals, inspired me and challenged me academically to make it to this day. Reflecting on my path that led me to this day, after spending more than eight years in the industry and having a very comfortable life with a good job, it was very hard decision for me to come back to student life with limited income. But I decided to take this challenge and thought may be I can do it. I pushed myself hard throughout this period. Today, I submit my master by research degree at the UTS and change my earlier 'maybe' into 'yes'. For this day today, first and foremost credit goes to my parents. My father taught me to make decisions and always encouraged and supported my bold decision. In all bad times, he encouraged and supported me. I wish I could share this moment with my mother (late). Whatever I am today is only because of her prayers. Thanks to my siblings, Salman and Haadia for always remaining a great source of support and encouragement. My grandfather (late) who always prayed for my success. I still remember that I used to call my grandfather and asked him to pray for me whenever I used to feel down. Without the support of my wife Huda, I would have never been able to complete my master degree. Aashir and Mahd, I owe you a lot. I could not give more time to you in your childhood due to coming back to home late nights during the weekdays and spending most of my weekends in the library doing my research-you were the real owners of this time. Huda, I am eternally thankful to you for taking up all the responsibilities and letting me focus on my research. Thank you. Thank you to my father-in-law and mother-in-law and as well as siblings of my wife who offered well wishes for my studies and prayed for my success

Research paper thumbnail of Early Identification of Depression Severity Levels on Reddit Using Ordinal Classification

Proceedings of the ACM Web Conference 2022

User-generated text on social media is a promising avenue for public health surveillance and has ... more User-generated text on social media is a promising avenue for public health surveillance and has been actively explored for its feasibility in the early identification of depression. Existing methods in the identification of depression have shown promising results; however, these methods were all focused on treating the identification as a binary classification problem. To date, there has been little effort towards identifying users' depression severity level and disregard the inherent ordinal nature across these fine-grain levels. This paper aims to make early identification of depression severity levels on social media data. To accomplish this, we built a new dataset based on the inherent ordinal nature over depression severity levels using clinical depression standards on Reddit posts. The posts were classified into 4 depression severity levels covering the clinical depression standards on social media. Accordingly, we reformulate the early identification of depression as an ordinal classification task over clinical depression standards such as Beck's Depression Inventory and the Depressive Disorder Annotation scheme to identify depression severity levels. With these, we propose a hierarchical attention method optimized to factor in the increasing depression severity levels through a soft probability distribution. We experimented using two datasets (a public dataset having more than one post from each user and our built dataset with a single user post) using real-world Reddit posts that have been classified according to questionnaires built by clinical experts and demonstrated that our method outperforms state-of-the-art models. Finally, we conclude by analyzing the minimum number of posts required to identify depression severity level followed by a discussion of empirical and practical considerations of our study. CCS CONCEPTS • Applied computing → Health informatics; • Computing methodologies → Natural language processing.