Naeem Seliya - Academia.edu (original) (raw)
Papers by Naeem Seliya
Zenodo (CERN European Organization for Nuclear Research), Nov 6, 2021
A fully automated, self-driving car can perceive its environment, determine the optimal route, an... more A fully automated, self-driving car can perceive its environment, determine the optimal route, and drive unaided by human intervention for the entire journey. Connected autonomous vehicles (CAVs) have the potential to drastically reduce accidents, travel time, and the environmental impact of road travel. Such technology includes the use of several sensors, various algorithms, interconnected network connections, and multiple auxiliary systems. CAVs have been subjected to attacks by malicious users to gain/deny control of one or more of its various systems. Data security and data privacy is one such area of CAVs that has been targeted via different types of attacks. The scope of this study is to present a good background knowledge of issues pertaining to different attacks in the context of data security and privacy, as well present a detailed review and analysis of eight very recent studies on the broad topic of security and privacy related attacks. Methodologies including Blockchain, Named Data Networking, Intrusion Detection System, Cognitive Engine, Adversarial Objects, and others have been investigated in the literature and problem-and context-specific models have been proposed by their respective authors.
IGI Global eBooks, 2007
Data mining and machine learning have numerous practical applications across several domains, esp... more Data mining and machine learning have numerous practical applications across several domains, especially for classification and prediction problems. This chapter involves a data mining and machine learning problem in the context of software quality modeling and estimation. Software measurements and software fault (defect) data have been used in the development of models that predict
arXiv (Cornell University), Oct 14, 2021
Physicians provide expert opinion to legal courts on the medical state of patients, including det... more Physicians provide expert opinion to legal courts on the medical state of patients, including determining if a patient is likely to have permanent or non-permanent injuries or ailments. An independent medical examination (IME) report summarizes a physician's medical opinion about a patient's health status based on the physician's expertise. IME reports contain private and sensitive information (Personally Identifiable Information or PII) that needs to be removed or randomly encoded before further research work can be conducted. In our study the IME is an orthopedic surgeon from a private practice in the United States. The goal of this research is to perform named entity recognition (NER) to identify and subsequently remove/encode PII information from IME reports prepared by the physician. We apply the NER toolkits of OpenNLP and spaCy, two freely available natural language processing platforms, and compare their precision, recall, and f-measure performance at identifying five categories of PII across trials of randomly selected IME reports using each model's common default parameters. We find that both platforms achieve high performance (f-measure > 0.9) at de-identification and that a spaCy model trained with a 70-30 train-test data split is most performant.
arXiv (Cornell University), Apr 26, 2022
Human activity recognition using deep learning techniques has become increasing popular because o... more Human activity recognition using deep learning techniques has become increasing popular because of its high effectivity with recognizing complex tasks, as well as being relatively low in costs compared to more traditional machine learning techniques. This paper surveys some state-of-the-art human activity recognition models that are based on deep learning architecture and has layers containing Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM), or a mix of more than one type for a hybrid system. The analysis outlines how the models are implemented to maximize its effectivity and some of the potential limitations it faces.
arXiv (Cornell University), Jul 27, 2022
In the recent years, social media has grown to become a major source of information for many onli... more In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one person's face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.
International Journal of Computer Science and Information Technology, Oct 31, 2021
A fully automated, self-driving car can perceive its environment, determine the optimal route, an... more A fully automated, self-driving car can perceive its environment, determine the optimal route, and drive unaided by human intervention for the entire journey. Connected autonomous vehicles (CAVs) have the potential to drastically reduce accidents, travel time, and the environmental impact of road travel. Such technology includes the use of several sensors, various algorithms, interconnected network connections, and multiple auxiliary systems. CAVs have been subjected to attacks by malicious users to gain/deny control of one or more of its various systems. Data security and data privacy is one such area of CAVs that has been targeted via different types of attacks. The scope of this study is to present a good background knowledge of issues pertaining to different attacks in the context of data security and privacy, as well present a detailed review and analysis of eight very recent studies on the broad topic of security and privacy related attacks. Methodologies including Blockchain, Named Data Networking, Intrusion Detection System, Cognitive Engine, Adversarial Objects, and others have been investigated in the literature and problem-and context-specific models have been proposed by their respective authors.
Journal of Computer Science, May 1, 2022
As modern cities continue to develop, smart devices are being used to improve citizens' lives. As... more As modern cities continue to develop, smart devices are being used to improve citizens' lives. As these devices become more sophisticated, the amount of information that they collect increases. A Smart City is a city that uses a large amount of generated data to constantly improve the services offered to the people that live there. Data mining techniques can be used to sift through the data and mine out meaningful patterns. Our research project focused on seven different disciplines within a Smart City, surveying the current state of research in each category. Smart Transportation is focused on decreasing congestion, increasing efficiency in public transportation, and improving the safety of pedestrians. Smart Healthcare is focused on modern healthcare monitoring systems and ambulance dispatch services. Smart Energy is focused on decreasing energy consumption and promoting green energy through WiFi thermostat optimization and a smart electric grid. Smart City Utilities is focused on creating algorithms to improve waste collection techniques and air quality. Smart City Planning is focused on land use and the placement of green spaces. Smart Networks and Privacy is focused on secure networks. Lastly, the Smart IoT (Internet of Things) Application is focused on next-generation networks like 5G.
IGI Global eBooks, Jan 18, 2011
Data mining and machine learning have numerous practical applications across several domains, esp... more Data mining and machine learning have numerous practical applications across several domains, especially for classification and prediction problems. This chapter involves a data mining and machine learning problem in the context of software quality modeling and estimation. Software measurements and software fault (defect) data have been used in the development of models that predict
Journal of computer sciences and applications, Oct 29, 2021
Recent advancements in technology now allow for the generation of massive quantities of data. The... more Recent advancements in technology now allow for the generation of massive quantities of data. There is a growing need to transmit this data faster and more securely such that it cannot be accessed by malicious individuals. Edge computing has emerged in previous research as a method capable of improving data transmission times and security before the data ends up in the cloud. Edge computing has an impressive transmission speed based on fifth generation (5G) communication which transmits data with low latency and high bandwidth. While edge computing is sufficient to extract important features from the raw data to prevent large amounts of data requiring excessive bandwidth to be transmitted, cloud computing is used for the computational processes required for developing algorithms and modeling the data. Edge computing also improves the quality of the user experience by saving time and integrating quality of life (QoL) features. QoL features are important for the healthcare sector by helping to provide real-time feedback of data produced by healthcare devices back to patients for a faster recovery. Edge computing has better energy efficiency, can reduce the electricity cost, and in turn help people reduce their living expenses. This paper will take a detailed look into edge computing applications around Internet of Things (IoT) devices, smart city infrastructure, and benefits to healthcare.
2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Dec 9, 2021
Mouse dynamics has grown in popularity as a novel, irreproducible behavioral biometric. Datasets ... more Mouse dynamics has grown in popularity as a novel, irreproducible behavioral biometric. Datasets which contain general, unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset, produced in 2016, was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. Collecting mouse movements in a dull, administrative manner, as Balabit does, may unintentionally homogenize data and is also not representative of real-world application scenarios. This paper presents a novel mouse dynamics dataset that has been collected while 10 users play the video game Minecraft on a desktop computer. Binary Random Forest (RF) classifiers are created for each user to detect differences between a specific user's movements and an imposter's movements. Two evaluation scenarios are proposed to evaluate the performance of these classifiers; one scenario outperformed previous works in all evaluation metrics, reaching average accuracy rates of 92%, while the other scenario successfully reported reduced instances of false authentications of imposters.
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Dec 1, 2022
arXiv (Cornell University), Jul 27, 2022
As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more co... more As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more commonly used for sensitive matters than ever before. A need to authenticate users continuously is sought after as a single-factor or multifactor authentication may only initially validate a user, which doesn't help if an impostor can bypass this initial validation. The field of touch dynamics emerges as a clear way to non-intrusively collect data about a user and their behaviors in order to develop and make imperative security-related decisions in real time. In this paper we present a novel dataset consisting of tracking 25 users playing two mobile games-Snake.io and Minecraft-each for 10 minutes, along with their relevant gesture data. From this data, we ran machine learning binary classifiersnamely Random Forest and K-Nearest Neighbor-to attempt to authenticate whether a sample of a particular user's actions were genuine. Our strongest model returned an average accuracy of roughly 93% for both games, showing touch dynamics can differentiate users effectively and is a feasible consideration for authentication schemes.
arXiv (Cornell University), May 7, 2022
The amount of secure data being stored on mobile devices has grown immensely in recent years. How... more The amount of secure data being stored on mobile devices has grown immensely in recent years. However, the security measures protecting this data have stayed static, with few improvements being done to the vulnerabilities of current authentication methods such as physiological biometrics or passwords. Instead of these methods, behavioral biometrics has recently been researched as a solution to these vulnerable authentication methods. In this study, we aim to contribute to the research being done on behavioral biometrics by creating and evaluating a user authentication scheme using behavioral biometrics. The behavioral biometrics used in this study include touch dynamics and phone movement, and we evaluate the performance of different single-modal and multi-modal combinations of the two biometrics. Using two publicly available datasets-BioIdent and Hand Movement Orientation and Grasp (H-MOG), this study uses seven common machine learning algorithms to evaluate performance. The algorithms used in the evaluation include Random Forest, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, Logistic Regression, Multilayer Perceptron, and Long Short-Term Memory Recurrent Neural Networks, with accuracy rates reaching as high as 86%.
Journal of Computer and Communications
Computer and Information Science
The amount of secure data being stored on mobile devices has grown immensely in recent years. How... more The amount of secure data being stored on mobile devices has grown immensely in recent years. However, the security measures protecting this data have stayed static, with few improvements being done to the vulnerabilities of current authentication methods such as physiological biometrics or passwords. Instead of these methods, behavioral biometrics has recently been researched as a solution to these vulnerable authentication methods. In this study, we aim to contribute to the research being done on behavioral biometrics by creating and evaluating a user authentication scheme using behavioral biometrics. The behavioral biometrics used in this study include touch dynamics and phone movement, and we evaluate the performance of different single-modal and multi-modal combinations of the two biometrics. Using two publicly available datasets - BioIdent and Hand Movement Orientation and Grasp (H-MOG), this study uses seven common machine learning algorithms to evaluate performance. The algo...
Big Data Technologies and Applications, 2016
Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become impo... more Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015
Using automated methods of labeling tweet sentiment, large volumes of tweets can be labeled and u... more Using automated methods of labeling tweet sentiment, large volumes of tweets can be labeled and used to train classifiers. Millions of tweets could be used to train a classifier, however, doing so is computationally expensive. Thus, it is valuable to establish how many tweets should be utilized to train a classifier, since using additional instances with no gain in performance is a waste of resources. In this study, we seek to find out how many tweets are needed before no significant improvements are observed for sentiment analysis when adding additional instances. We train and evaluate classifiers using C4.5 decision tree, Naïve Bayes, 5 Nearest Neighbor and Radial Basis Function Network, with seven datasets varying from 1000 to 243,000 instances. Models are trained using four runs of 5-fold cross validation. Additionally, we conduct statistical tests to verify our observations and examine the impact of limiting features using frequency. All learners were found to improve with dataset size, with Naïve Bayes being the best performing learner. We found that Naïve Bayes did not significantly benefit from using more than 81,000 instances. To the best of our knowledge, this is the first study to investigate how learners scale in respect to dataset size with results verified using statistical tests and multiple models trained for each learner and dataset size. Additionally, we investigated using feature frequency to greatly reduce data grid size with either a small increase or decrease in classifier performance depending on choice of learner.
2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)
In the recent years, social media has grown to become a major source of information for many onli... more In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one person's face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.
2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)
In recent years, the amount of secure information being stored on mobile devices has grown expone... more In recent years, the amount of secure information being stored on mobile devices has grown exponentially. However, current security schemas for mobile devices such as physiological biometrics and passwords are not secure enough to protect this information. Behavioral biometrics have been heavily researched as a possible solution to this security deficiency for mobile devices. This study aims to contribute to this innovative research by evaluating the performance of a multi-modal behavioral biometric based user authentication scheme using touch dynamics and phone movement. This study uses a fusion of two popular publicly available datasets-the Hand Movement Orientation and Grasp (HMOG) dataset and the BioIdent dataset. This study evaluates our model's performance using three common machine learning algorithms; Random Forest, Support Vector Machine, and K-Nearest Neighbor reaching accuracy rates as high as 82%, with each algorithm performing respectively for all success metrics reported.
Zenodo (CERN European Organization for Nuclear Research), Nov 6, 2021
A fully automated, self-driving car can perceive its environment, determine the optimal route, an... more A fully automated, self-driving car can perceive its environment, determine the optimal route, and drive unaided by human intervention for the entire journey. Connected autonomous vehicles (CAVs) have the potential to drastically reduce accidents, travel time, and the environmental impact of road travel. Such technology includes the use of several sensors, various algorithms, interconnected network connections, and multiple auxiliary systems. CAVs have been subjected to attacks by malicious users to gain/deny control of one or more of its various systems. Data security and data privacy is one such area of CAVs that has been targeted via different types of attacks. The scope of this study is to present a good background knowledge of issues pertaining to different attacks in the context of data security and privacy, as well present a detailed review and analysis of eight very recent studies on the broad topic of security and privacy related attacks. Methodologies including Blockchain, Named Data Networking, Intrusion Detection System, Cognitive Engine, Adversarial Objects, and others have been investigated in the literature and problem-and context-specific models have been proposed by their respective authors.
IGI Global eBooks, 2007
Data mining and machine learning have numerous practical applications across several domains, esp... more Data mining and machine learning have numerous practical applications across several domains, especially for classification and prediction problems. This chapter involves a data mining and machine learning problem in the context of software quality modeling and estimation. Software measurements and software fault (defect) data have been used in the development of models that predict
arXiv (Cornell University), Oct 14, 2021
Physicians provide expert opinion to legal courts on the medical state of patients, including det... more Physicians provide expert opinion to legal courts on the medical state of patients, including determining if a patient is likely to have permanent or non-permanent injuries or ailments. An independent medical examination (IME) report summarizes a physician's medical opinion about a patient's health status based on the physician's expertise. IME reports contain private and sensitive information (Personally Identifiable Information or PII) that needs to be removed or randomly encoded before further research work can be conducted. In our study the IME is an orthopedic surgeon from a private practice in the United States. The goal of this research is to perform named entity recognition (NER) to identify and subsequently remove/encode PII information from IME reports prepared by the physician. We apply the NER toolkits of OpenNLP and spaCy, two freely available natural language processing platforms, and compare their precision, recall, and f-measure performance at identifying five categories of PII across trials of randomly selected IME reports using each model's common default parameters. We find that both platforms achieve high performance (f-measure > 0.9) at de-identification and that a spaCy model trained with a 70-30 train-test data split is most performant.
arXiv (Cornell University), Apr 26, 2022
Human activity recognition using deep learning techniques has become increasing popular because o... more Human activity recognition using deep learning techniques has become increasing popular because of its high effectivity with recognizing complex tasks, as well as being relatively low in costs compared to more traditional machine learning techniques. This paper surveys some state-of-the-art human activity recognition models that are based on deep learning architecture and has layers containing Convolution Neural Networks (CNN), Long Short-Term Memory (LSTM), or a mix of more than one type for a hybrid system. The analysis outlines how the models are implemented to maximize its effectivity and some of the potential limitations it faces.
arXiv (Cornell University), Jul 27, 2022
In the recent years, social media has grown to become a major source of information for many onli... more In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one person's face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.
International Journal of Computer Science and Information Technology, Oct 31, 2021
A fully automated, self-driving car can perceive its environment, determine the optimal route, an... more A fully automated, self-driving car can perceive its environment, determine the optimal route, and drive unaided by human intervention for the entire journey. Connected autonomous vehicles (CAVs) have the potential to drastically reduce accidents, travel time, and the environmental impact of road travel. Such technology includes the use of several sensors, various algorithms, interconnected network connections, and multiple auxiliary systems. CAVs have been subjected to attacks by malicious users to gain/deny control of one or more of its various systems. Data security and data privacy is one such area of CAVs that has been targeted via different types of attacks. The scope of this study is to present a good background knowledge of issues pertaining to different attacks in the context of data security and privacy, as well present a detailed review and analysis of eight very recent studies on the broad topic of security and privacy related attacks. Methodologies including Blockchain, Named Data Networking, Intrusion Detection System, Cognitive Engine, Adversarial Objects, and others have been investigated in the literature and problem-and context-specific models have been proposed by their respective authors.
Journal of Computer Science, May 1, 2022
As modern cities continue to develop, smart devices are being used to improve citizens' lives. As... more As modern cities continue to develop, smart devices are being used to improve citizens' lives. As these devices become more sophisticated, the amount of information that they collect increases. A Smart City is a city that uses a large amount of generated data to constantly improve the services offered to the people that live there. Data mining techniques can be used to sift through the data and mine out meaningful patterns. Our research project focused on seven different disciplines within a Smart City, surveying the current state of research in each category. Smart Transportation is focused on decreasing congestion, increasing efficiency in public transportation, and improving the safety of pedestrians. Smart Healthcare is focused on modern healthcare monitoring systems and ambulance dispatch services. Smart Energy is focused on decreasing energy consumption and promoting green energy through WiFi thermostat optimization and a smart electric grid. Smart City Utilities is focused on creating algorithms to improve waste collection techniques and air quality. Smart City Planning is focused on land use and the placement of green spaces. Smart Networks and Privacy is focused on secure networks. Lastly, the Smart IoT (Internet of Things) Application is focused on next-generation networks like 5G.
IGI Global eBooks, Jan 18, 2011
Data mining and machine learning have numerous practical applications across several domains, esp... more Data mining and machine learning have numerous practical applications across several domains, especially for classification and prediction problems. This chapter involves a data mining and machine learning problem in the context of software quality modeling and estimation. Software measurements and software fault (defect) data have been used in the development of models that predict
Journal of computer sciences and applications, Oct 29, 2021
Recent advancements in technology now allow for the generation of massive quantities of data. The... more Recent advancements in technology now allow for the generation of massive quantities of data. There is a growing need to transmit this data faster and more securely such that it cannot be accessed by malicious individuals. Edge computing has emerged in previous research as a method capable of improving data transmission times and security before the data ends up in the cloud. Edge computing has an impressive transmission speed based on fifth generation (5G) communication which transmits data with low latency and high bandwidth. While edge computing is sufficient to extract important features from the raw data to prevent large amounts of data requiring excessive bandwidth to be transmitted, cloud computing is used for the computational processes required for developing algorithms and modeling the data. Edge computing also improves the quality of the user experience by saving time and integrating quality of life (QoL) features. QoL features are important for the healthcare sector by helping to provide real-time feedback of data produced by healthcare devices back to patients for a faster recovery. Edge computing has better energy efficiency, can reduce the electricity cost, and in turn help people reduce their living expenses. This paper will take a detailed look into edge computing applications around Internet of Things (IoT) devices, smart city infrastructure, and benefits to healthcare.
2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Dec 9, 2021
Mouse dynamics has grown in popularity as a novel, irreproducible behavioral biometric. Datasets ... more Mouse dynamics has grown in popularity as a novel, irreproducible behavioral biometric. Datasets which contain general, unrestricted mouse movements from users are sparse in the current literature. The Balabit mouse dynamics dataset, produced in 2016, was made for a data science competition and despite some of its shortcomings, is considered to be the first publicly available mouse dynamics dataset. Collecting mouse movements in a dull, administrative manner, as Balabit does, may unintentionally homogenize data and is also not representative of real-world application scenarios. This paper presents a novel mouse dynamics dataset that has been collected while 10 users play the video game Minecraft on a desktop computer. Binary Random Forest (RF) classifiers are created for each user to detect differences between a specific user's movements and an imposter's movements. Two evaluation scenarios are proposed to evaluate the performance of these classifiers; one scenario outperformed previous works in all evaluation metrics, reaching average accuracy rates of 92%, while the other scenario successfully reported reduced instances of false authentications of imposters.
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), Dec 1, 2022
arXiv (Cornell University), Jul 27, 2022
As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more co... more As technology grows and evolves rapidly, it is increasingly clear that mobile devices are more commonly used for sensitive matters than ever before. A need to authenticate users continuously is sought after as a single-factor or multifactor authentication may only initially validate a user, which doesn't help if an impostor can bypass this initial validation. The field of touch dynamics emerges as a clear way to non-intrusively collect data about a user and their behaviors in order to develop and make imperative security-related decisions in real time. In this paper we present a novel dataset consisting of tracking 25 users playing two mobile games-Snake.io and Minecraft-each for 10 minutes, along with their relevant gesture data. From this data, we ran machine learning binary classifiersnamely Random Forest and K-Nearest Neighbor-to attempt to authenticate whether a sample of a particular user's actions were genuine. Our strongest model returned an average accuracy of roughly 93% for both games, showing touch dynamics can differentiate users effectively and is a feasible consideration for authentication schemes.
arXiv (Cornell University), May 7, 2022
The amount of secure data being stored on mobile devices has grown immensely in recent years. How... more The amount of secure data being stored on mobile devices has grown immensely in recent years. However, the security measures protecting this data have stayed static, with few improvements being done to the vulnerabilities of current authentication methods such as physiological biometrics or passwords. Instead of these methods, behavioral biometrics has recently been researched as a solution to these vulnerable authentication methods. In this study, we aim to contribute to the research being done on behavioral biometrics by creating and evaluating a user authentication scheme using behavioral biometrics. The behavioral biometrics used in this study include touch dynamics and phone movement, and we evaluate the performance of different single-modal and multi-modal combinations of the two biometrics. Using two publicly available datasets-BioIdent and Hand Movement Orientation and Grasp (H-MOG), this study uses seven common machine learning algorithms to evaluate performance. The algorithms used in the evaluation include Random Forest, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, Logistic Regression, Multilayer Perceptron, and Long Short-Term Memory Recurrent Neural Networks, with accuracy rates reaching as high as 86%.
Journal of Computer and Communications
Computer and Information Science
The amount of secure data being stored on mobile devices has grown immensely in recent years. How... more The amount of secure data being stored on mobile devices has grown immensely in recent years. However, the security measures protecting this data have stayed static, with few improvements being done to the vulnerabilities of current authentication methods such as physiological biometrics or passwords. Instead of these methods, behavioral biometrics has recently been researched as a solution to these vulnerable authentication methods. In this study, we aim to contribute to the research being done on behavioral biometrics by creating and evaluating a user authentication scheme using behavioral biometrics. The behavioral biometrics used in this study include touch dynamics and phone movement, and we evaluate the performance of different single-modal and multi-modal combinations of the two biometrics. Using two publicly available datasets - BioIdent and Hand Movement Orientation and Grasp (H-MOG), this study uses seven common machine learning algorithms to evaluate performance. The algo...
Big Data Technologies and Applications, 2016
Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become impo... more Big Data Analytics and Deep Learning are two high-focus of data science. Big Data has become important as many organizations both public and private have been collecting massive amounts of domain-specific information, which can contain useful information about problems such as national intelligence, cyber security, fraud detection, marketing, and medical informatics. Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and decisions, impacting existing and future technology. Deep Learning algorithms extract high-level, complex abstractions as data representations through a hierarchical learning process. Complex abstractions are learnt at a given level based on relatively simpler abstractions formulated in the preceding level in the hierarchy. A key benefit of Deep Learning is the analysis and learning of massive amounts of unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled and un-categorized. In the present study, we explore how Deep Learning can be utilized for addressing some important problems in Big Data Analytics, including extracting complex patterns from massive volumes of data, semantic indexing, data tagging, fast information retrieval, and simplifying discriminative tasks. We also investigate some aspects of Deep Learning research that need further exploration to incorporate specific challenges introduced by Big Data Analytics, including streaming data, high-dimensional data, scalability of models, and distributed computing. We conclude by presenting insights into relevant future works by posing some questions, including defining data sampling criteria, domain adaptation modeling, defining criteria for obtaining useful data abstractions, improving semantic indexing, semi-supervised learning, and active learning.
2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015
Using automated methods of labeling tweet sentiment, large volumes of tweets can be labeled and u... more Using automated methods of labeling tweet sentiment, large volumes of tweets can be labeled and used to train classifiers. Millions of tweets could be used to train a classifier, however, doing so is computationally expensive. Thus, it is valuable to establish how many tweets should be utilized to train a classifier, since using additional instances with no gain in performance is a waste of resources. In this study, we seek to find out how many tweets are needed before no significant improvements are observed for sentiment analysis when adding additional instances. We train and evaluate classifiers using C4.5 decision tree, Naïve Bayes, 5 Nearest Neighbor and Radial Basis Function Network, with seven datasets varying from 1000 to 243,000 instances. Models are trained using four runs of 5-fold cross validation. Additionally, we conduct statistical tests to verify our observations and examine the impact of limiting features using frequency. All learners were found to improve with dataset size, with Naïve Bayes being the best performing learner. We found that Naïve Bayes did not significantly benefit from using more than 81,000 instances. To the best of our knowledge, this is the first study to investigate how learners scale in respect to dataset size with results verified using statistical tests and multiple models trained for each learner and dataset size. Additionally, we investigated using feature frequency to greatly reduce data grid size with either a small increase or decrease in classifier performance depending on choice of learner.
2022 9th International Conference on Soft Computing & Machine Intelligence (ISCMI)
In the recent years, social media has grown to become a major source of information for many onli... more In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one person's face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.
2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML)
In recent years, the amount of secure information being stored on mobile devices has grown expone... more In recent years, the amount of secure information being stored on mobile devices has grown exponentially. However, current security schemas for mobile devices such as physiological biometrics and passwords are not secure enough to protect this information. Behavioral biometrics have been heavily researched as a possible solution to this security deficiency for mobile devices. This study aims to contribute to this innovative research by evaluating the performance of a multi-modal behavioral biometric based user authentication scheme using touch dynamics and phone movement. This study uses a fusion of two popular publicly available datasets-the Hand Movement Orientation and Grasp (HMOG) dataset and the BioIdent dataset. This study evaluates our model's performance using three common machine learning algorithms; Random Forest, Support Vector Machine, and K-Nearest Neighbor reaching accuracy rates as high as 82%, with each algorithm performing respectively for all success metrics reported.