Zunera Jalil | Air University, Islamabad (original) (raw)
Papers by Zunera Jalil
ACM Transactions on Multimedia Computing, Communications, and Applications, Apr 13, 2023
ACM Transactions on Multimedia Computing, Communications, and Applications
Recent advances in artificial intelligence have led to deepfake images, enabling users to replace... more Recent advances in artificial intelligence have led to deepfake images, enabling users to replace a real face with a genuine one. deepfake images have recently been used to malign public figures, politicians, and even average citizens. deepfake but realistic images have been used to stir political dissatisfaction, blackmail, propagate false news, and even carry out bogus terrorist attacks. Thus, identifying real images from fakes has got more challenging. To avoid these issues, this study employs transfer learning and data augmentation technique to classify deepfake images. For experimentation, 190,335 RGB-resolution deepfake and real images and image augmentation methods are used to prepare the dataset. The experiments use the deep learning models: convolutional neural network (CNN), Inception V3, visual geometry group (VGG19) and VGG16 with a transfer learning approach. Essential evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix and AUC-ROC curve score) a...
Scientific Reports
With time, numerous online communication platforms have emerged that allow people to express them... more With time, numerous online communication platforms have emerged that allow people to express themselves, increasing the dissemination of toxic languages, such as racism, sexual harassment, and other negative behaviors that are not accepted in polite society. As a result, toxic language identification in online communication has emerged as a critical application of natural language processing. Numerous academic and industrial researchers have recently researched toxic language identification using machine learning algorithms. However, Nontoxic comments, including particular identification descriptors, such as Muslim, Jewish, White, and Black, were assigned unrealistically high toxicity ratings in several machine learning models. This research analyzes and compares modern deep learning algorithms for multilabel toxic comments classification. We explore two scenarios: the first is a multilabel classification of Religious toxic comments, and the second is a multilabel classification of ...
Wireless Communications and Mobile Computing
The network intrusion detection system (NIDs) is a significant research milestone in information ... more The network intrusion detection system (NIDs) is a significant research milestone in information security. NIDs can scan and analyze the network to detect an attack or anomaly, which may be a continuing intrusion or perhaps an intrusion that has just occurred. During the pandemic, cybercriminals realized that home networks lurked with vulnerabilities due to a lack of security and computational limitations. A fundamental difficulty in NIDs is providing an effective, robust, lightweight, and rapid framework to perform real-time intrusion detection. This research proposes an efficient, functional cybersecurity approach based on machine/deep learning algorithms to detect anomalies using lightweight network-based IDs. A lightweight, real-time, network-based anomaly detection system can be used to secure connected IoT devices. The UNSW-NB15 dataset is used to evaluate the proposed approach DeepNet and compare results alongside other state-of-the-art existing techniques. For the classifica...
2022 2nd International Conference on Artificial Intelligence (ICAI)
Computational Intelligence and Neuroscience
Coronavirus (COVID-19) is a highly severe infection caused by the severe acute respiratory corona... more Coronavirus (COVID-19) is a highly severe infection caused by the severe acute respiratory coronavirus 2 (SARS-CoV-2). The polymerase chain reaction (PCR) test is essential to confirm the COVID-19 infection, but it has certain limitations, including paucity of reagents, is computationally time-consuming, and requires expert clinicians. Clinicians suggest that the PCR test is not a reliable automated COVID-19 patient detection system. This study proposed a machine learning-based approach to evaluate the PCR role in COVID-19 detection. We collect real data containing 603 COVID-19 samples from the Pakistan Institute of Medical Sciences (PIMS) Hospital in Islamabad, Pakistan, during the third COVID-19 wave. The experiments are separated into two sets. The first set comprises 24 features, including PCR test results, whereas the second comprises 24 features without PCR test. The findings demonstrate that the decision tree achieves the best detection rate for positive and negative COVID-19...
Frontiers in Public Health, 2022
Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. He... more Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. Healthcare systems, which are IoT-oriented, provide monitoring services of patients' data and help take immediate steps in an emergency. Currently, machine learning-based techniques are adopted to ensure security and other non-functional requirements in smart health care systems. However, no attention is given to classifying the non-functional requirements from requirement documents. The manual process of classifying the non-functional requirements from documents is erroneous and laborious. Missing non-functional requirements in the Requirement Engineering (RE) phase results in IoT oriented healthcare system with compromised security and performance. In this research, an experiment is performed where non-functional requirements are classified from the IoT-oriented healthcare system's requirement document. The machine learning algorithms considered for classification are Logistic Re...
2021 International Conference on Cyber Warfare and Security (ICCWS)
Electronic mail has been in use for decades and more than four billion users access their emails ... more Electronic mail has been in use for decades and more than four billion users access their emails using different domains and servers. Emails are considered an official way of communication in remote working modes and in online businesses. Email labeling can reduce the amount of effort to manage this communication. Email classification is so far done to classify emails such as Spam, Non-spam, Junk, social media, etc. However, email classification keeping in view the types of cybercrimes committed through email is not done. Emails can be labeled as Spam, Phishing, fraudulent, harassing, bullying, or can be a general/normal email. This identification is one of the most challenging tasks for both email service providers and consumers. Several spam identification models have previously been proposed and tested but very limited work has been done so far on the multi-class classification of emails. Emails can be classified into more than two classes (spam and ham). In this paper, we have proposed a solution to classify emails into four classes: fraudulent, suspicious, harassment, and normal. A deep learning approach named Long Short Term Memory(LSTM) with stratified sampling has been used to identify the email classes. An effort has also been made to balance the input dataset using over-sampling methods. The proposed model obtained a classification accuracy of more than 90%. with stratified sampling only and more than 95% by applying data balancing techniques on the dataset.
Scientific Reports
With time, textual data is proliferating, primarily through the publications of articles. With th... more With time, textual data is proliferating, primarily through the publications of articles. With this rapid increase in textual data, anonymous content is also increasing. Researchers are searching for alternative strategies to identify the author of an unknown text. There is a need to develop a system to identify the actual author of unknown texts based on a given set of writing samples. This study presents a novel approach based on ensemble learning, DistilBERT, and conventional machine learning techniques for authorship identification. The proposed approach extracts the valuable characteristics of the author using a count vectorizer and bi-gram Term frequency-inverse document frequency (TF-IDF). An extensive and detailed dataset, “All the news” is used in this study for experimentation. The dataset is divided into three subsets (article1, article2, and article3). We limit the scope of the dataset and selected ten authors in the first scope and 20 authors in the second scope for exp...
IEEE
Emails are being used as a reliable, secure, and formal mode of communication for a long time. Wi... more Emails are being used as a reliable, secure, and formal mode of communication for a long time. With fast and secure communication technologies, reliance on Email has increased as well. The massive increase in email data has led to a big challenge in managing emails. Emails so far can be classified and grouped based on sender, size, and date. However, there is a need to detect and classify emails based on the contents contained therein. Several approaches have been used in the past for content-based classification of emails as Spam or Non-Spam Email. In this paper, we propose a multi-label email classification approach to organize emails. An efficient classification method has been proposed for forensic investigations of massive email data (e.g., a disk image of an email server). This method would help the investigator in Email related crimes investigations. A comparative study of machine learning algorithms identified Logistic Regression as a method that achieves the highest accuracy compared to Naive Bayes, Stochastic Gradient Descent, Random Forest, and Support Vector Machine. Experiments conducted on benchmark data sets depicted that logistic Regression performs best, with an accuracy of 91.9% with bi-gram features.
IEEE Access PP(99), 2021
Artificial Intelligence (AI), in combination with the Internet of Things (IoT), called (AIoT), an... more Artificial Intelligence (AI), in combination with the Internet of Things (IoT), called (AIoT),
an emerging trend in industrial applications, is capable of intelligent decision-making with self-driven
analytic. With its extensive usage in diverse scenarios, IoT devices generate bulk data that gets contrived by
attackers to disrupt normal operations and services. Hence, there is a daring need for proactive data analyses
that must prevent cyber-attacks and crimes. To investigate crimes involving Electronic Mail (email), analysis
of both the header and the email body is required since the semantics of communication helps to identify the
source of potential evidence.With the continued growth of data shared via emails, investigators now face the
daunting challenge of extracting the required semantic information from the bulks of emails, thereby causing
a delay in the investigation process. This gives an edge to the criminal in erasing their footprints of malicious
acts. The existing keyword-based search techniques and filtration often result in extraneous, short sequence
emails, which skips meaningful information. To overcome the above limitation, we successfully designed a
novel efficient approach called SeFACED that uses Long short-term memory (LSTM) based Gated Recurrent
Neural Network (GRU) for multiclass email classification. SeFACED not only caters to short sequences but
long dependencies of 1000+ characters as well. SeFACED focuses on tuning LSTM based GRU parameters
to attain the best performance, which has its assessment by comparing it with traditional Machine Learning
(ML) and Deep Learning (DL) models and state-of-the-art studies on the subject. Experimental results on
self-extended benchmark datasets exhibit that SeFACED effectively outperforms existing methods while
keeping the classification process robust and reliable.
Lecture Notes in Electrical Engineering, 2022
Mathematical Biosciences and Engineering, 2021
Spam is any form of annoying and unsought digital communication sent in bulk and may contain offe... more Spam is any form of annoying and unsought digital communication sent in bulk and may contain offensive content feasting viruses and cyber-attacks. The voluminous increase in spam has necessitated developing more reliable and vigorous artificial intelligence-based anti-spam filters. Besides text, an email sometimes contains multimedia content such as audio, video, and images. However, text-centric email spam filtering employing text classification techniques remains today's preferred choice. In this paper, we show that text pre-processing techniques nullify the detection of malicious contents in an obscure communication framework. We use Spamassassin corpus with and without text pre-processing and examined it using machine learning (ML) and deep learning (DL) algorithms to classify these as ham or spam emails. The proposed DL-based approach consistently outperforms ML models. In the first stage, using pre-processing techniques, the long-short-term memory (LSTM) model achieves the...
2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI), 2021
ACM Transactions on Multimedia Computing, Communications, and Applications, Apr 13, 2023
ACM Transactions on Multimedia Computing, Communications, and Applications
Recent advances in artificial intelligence have led to deepfake images, enabling users to replace... more Recent advances in artificial intelligence have led to deepfake images, enabling users to replace a real face with a genuine one. deepfake images have recently been used to malign public figures, politicians, and even average citizens. deepfake but realistic images have been used to stir political dissatisfaction, blackmail, propagate false news, and even carry out bogus terrorist attacks. Thus, identifying real images from fakes has got more challenging. To avoid these issues, this study employs transfer learning and data augmentation technique to classify deepfake images. For experimentation, 190,335 RGB-resolution deepfake and real images and image augmentation methods are used to prepare the dataset. The experiments use the deep learning models: convolutional neural network (CNN), Inception V3, visual geometry group (VGG19) and VGG16 with a transfer learning approach. Essential evaluation metrics (accuracy, precision, recall, F1-score, confusion matrix and AUC-ROC curve score) a...
Scientific Reports
With time, numerous online communication platforms have emerged that allow people to express them... more With time, numerous online communication platforms have emerged that allow people to express themselves, increasing the dissemination of toxic languages, such as racism, sexual harassment, and other negative behaviors that are not accepted in polite society. As a result, toxic language identification in online communication has emerged as a critical application of natural language processing. Numerous academic and industrial researchers have recently researched toxic language identification using machine learning algorithms. However, Nontoxic comments, including particular identification descriptors, such as Muslim, Jewish, White, and Black, were assigned unrealistically high toxicity ratings in several machine learning models. This research analyzes and compares modern deep learning algorithms for multilabel toxic comments classification. We explore two scenarios: the first is a multilabel classification of Religious toxic comments, and the second is a multilabel classification of ...
Wireless Communications and Mobile Computing
The network intrusion detection system (NIDs) is a significant research milestone in information ... more The network intrusion detection system (NIDs) is a significant research milestone in information security. NIDs can scan and analyze the network to detect an attack or anomaly, which may be a continuing intrusion or perhaps an intrusion that has just occurred. During the pandemic, cybercriminals realized that home networks lurked with vulnerabilities due to a lack of security and computational limitations. A fundamental difficulty in NIDs is providing an effective, robust, lightweight, and rapid framework to perform real-time intrusion detection. This research proposes an efficient, functional cybersecurity approach based on machine/deep learning algorithms to detect anomalies using lightweight network-based IDs. A lightweight, real-time, network-based anomaly detection system can be used to secure connected IoT devices. The UNSW-NB15 dataset is used to evaluate the proposed approach DeepNet and compare results alongside other state-of-the-art existing techniques. For the classifica...
2022 2nd International Conference on Artificial Intelligence (ICAI)
Computational Intelligence and Neuroscience
Coronavirus (COVID-19) is a highly severe infection caused by the severe acute respiratory corona... more Coronavirus (COVID-19) is a highly severe infection caused by the severe acute respiratory coronavirus 2 (SARS-CoV-2). The polymerase chain reaction (PCR) test is essential to confirm the COVID-19 infection, but it has certain limitations, including paucity of reagents, is computationally time-consuming, and requires expert clinicians. Clinicians suggest that the PCR test is not a reliable automated COVID-19 patient detection system. This study proposed a machine learning-based approach to evaluate the PCR role in COVID-19 detection. We collect real data containing 603 COVID-19 samples from the Pakistan Institute of Medical Sciences (PIMS) Hospital in Islamabad, Pakistan, during the third COVID-19 wave. The experiments are separated into two sets. The first set comprises 24 features, including PCR test results, whereas the second comprises 24 features without PCR test. The findings demonstrate that the decision tree achieves the best detection rate for positive and negative COVID-19...
Frontiers in Public Health, 2022
Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. He... more Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. Healthcare systems, which are IoT-oriented, provide monitoring services of patients' data and help take immediate steps in an emergency. Currently, machine learning-based techniques are adopted to ensure security and other non-functional requirements in smart health care systems. However, no attention is given to classifying the non-functional requirements from requirement documents. The manual process of classifying the non-functional requirements from documents is erroneous and laborious. Missing non-functional requirements in the Requirement Engineering (RE) phase results in IoT oriented healthcare system with compromised security and performance. In this research, an experiment is performed where non-functional requirements are classified from the IoT-oriented healthcare system's requirement document. The machine learning algorithms considered for classification are Logistic Re...
2021 International Conference on Cyber Warfare and Security (ICCWS)
Electronic mail has been in use for decades and more than four billion users access their emails ... more Electronic mail has been in use for decades and more than four billion users access their emails using different domains and servers. Emails are considered an official way of communication in remote working modes and in online businesses. Email labeling can reduce the amount of effort to manage this communication. Email classification is so far done to classify emails such as Spam, Non-spam, Junk, social media, etc. However, email classification keeping in view the types of cybercrimes committed through email is not done. Emails can be labeled as Spam, Phishing, fraudulent, harassing, bullying, or can be a general/normal email. This identification is one of the most challenging tasks for both email service providers and consumers. Several spam identification models have previously been proposed and tested but very limited work has been done so far on the multi-class classification of emails. Emails can be classified into more than two classes (spam and ham). In this paper, we have proposed a solution to classify emails into four classes: fraudulent, suspicious, harassment, and normal. A deep learning approach named Long Short Term Memory(LSTM) with stratified sampling has been used to identify the email classes. An effort has also been made to balance the input dataset using over-sampling methods. The proposed model obtained a classification accuracy of more than 90%. with stratified sampling only and more than 95% by applying data balancing techniques on the dataset.
Scientific Reports
With time, textual data is proliferating, primarily through the publications of articles. With th... more With time, textual data is proliferating, primarily through the publications of articles. With this rapid increase in textual data, anonymous content is also increasing. Researchers are searching for alternative strategies to identify the author of an unknown text. There is a need to develop a system to identify the actual author of unknown texts based on a given set of writing samples. This study presents a novel approach based on ensemble learning, DistilBERT, and conventional machine learning techniques for authorship identification. The proposed approach extracts the valuable characteristics of the author using a count vectorizer and bi-gram Term frequency-inverse document frequency (TF-IDF). An extensive and detailed dataset, “All the news” is used in this study for experimentation. The dataset is divided into three subsets (article1, article2, and article3). We limit the scope of the dataset and selected ten authors in the first scope and 20 authors in the second scope for exp...
IEEE
Emails are being used as a reliable, secure, and formal mode of communication for a long time. Wi... more Emails are being used as a reliable, secure, and formal mode of communication for a long time. With fast and secure communication technologies, reliance on Email has increased as well. The massive increase in email data has led to a big challenge in managing emails. Emails so far can be classified and grouped based on sender, size, and date. However, there is a need to detect and classify emails based on the contents contained therein. Several approaches have been used in the past for content-based classification of emails as Spam or Non-Spam Email. In this paper, we propose a multi-label email classification approach to organize emails. An efficient classification method has been proposed for forensic investigations of massive email data (e.g., a disk image of an email server). This method would help the investigator in Email related crimes investigations. A comparative study of machine learning algorithms identified Logistic Regression as a method that achieves the highest accuracy compared to Naive Bayes, Stochastic Gradient Descent, Random Forest, and Support Vector Machine. Experiments conducted on benchmark data sets depicted that logistic Regression performs best, with an accuracy of 91.9% with bi-gram features.
IEEE Access PP(99), 2021
Artificial Intelligence (AI), in combination with the Internet of Things (IoT), called (AIoT), an... more Artificial Intelligence (AI), in combination with the Internet of Things (IoT), called (AIoT),
an emerging trend in industrial applications, is capable of intelligent decision-making with self-driven
analytic. With its extensive usage in diverse scenarios, IoT devices generate bulk data that gets contrived by
attackers to disrupt normal operations and services. Hence, there is a daring need for proactive data analyses
that must prevent cyber-attacks and crimes. To investigate crimes involving Electronic Mail (email), analysis
of both the header and the email body is required since the semantics of communication helps to identify the
source of potential evidence.With the continued growth of data shared via emails, investigators now face the
daunting challenge of extracting the required semantic information from the bulks of emails, thereby causing
a delay in the investigation process. This gives an edge to the criminal in erasing their footprints of malicious
acts. The existing keyword-based search techniques and filtration often result in extraneous, short sequence
emails, which skips meaningful information. To overcome the above limitation, we successfully designed a
novel efficient approach called SeFACED that uses Long short-term memory (LSTM) based Gated Recurrent
Neural Network (GRU) for multiclass email classification. SeFACED not only caters to short sequences but
long dependencies of 1000+ characters as well. SeFACED focuses on tuning LSTM based GRU parameters
to attain the best performance, which has its assessment by comparing it with traditional Machine Learning
(ML) and Deep Learning (DL) models and state-of-the-art studies on the subject. Experimental results on
self-extended benchmark datasets exhibit that SeFACED effectively outperforms existing methods while
keeping the classification process robust and reliable.
Lecture Notes in Electrical Engineering, 2022
Mathematical Biosciences and Engineering, 2021
Spam is any form of annoying and unsought digital communication sent in bulk and may contain offe... more Spam is any form of annoying and unsought digital communication sent in bulk and may contain offensive content feasting viruses and cyber-attacks. The voluminous increase in spam has necessitated developing more reliable and vigorous artificial intelligence-based anti-spam filters. Besides text, an email sometimes contains multimedia content such as audio, video, and images. However, text-centric email spam filtering employing text classification techniques remains today's preferred choice. In this paper, we show that text pre-processing techniques nullify the detection of malicious contents in an obscure communication framework. We use Spamassassin corpus with and without text pre-processing and examined it using machine learning (ML) and deep learning (DL) algorithms to classify these as ham or spam emails. The proposed DL-based approach consistently outperforms ML models. In the first stage, using pre-processing techniques, the long-short-term memory (LSTM) model achieves the...
2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI), 2021