Vidya Saikrishna - Academia.edu (original) (raw)
Papers by Vidya Saikrishna
2022 IEEE International Conference on Data Mining Workshops (ICDMW), Nov 1, 2022
An expeditious development of graph learning in recent years has found innumerable applications i... more An expeditious development of graph learning in recent years has found innumerable applications in several diversified fields. Of the main associated challenges are the volume and complexity of graph data. The graph learning models suffer from the inability to efficiently learn graph information. In order to indemnify this inefficacy, physics-informed graph learning (PIGL) is emerging. PIGL incorporates physics rules while performing graph learning, which has enormous benefits. This paper presents a systematic review of PIGL methods. We begin with introducing a unified framework of graph learning models followed by examining existing PIGL methods in relation to the unified framework. We also discuss several future challenges for PIGL. This survey paper is expected to stimulate innovative research and development activities pertaining to PIGL.
2022 IEEE 19th India Council International Conference (INDICON)
IEEE Transactions on Neural Networks and Learning Systems
String searching algorithms, sometimes called string matching algorithms, are an important class ... more String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.[11] String matching is a classical problem in computer science. In this paper we are trying to explore the various diversified fields where string matching has an eminent role to play and is found as a solution to many problems. Few of the fields exploited are intrusion detection in network, application in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. Here we discuss how string matching is found useful in finding solutions to above problems. String matching algorithms can be categorized
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
An expeditious development of graph learning in recent years has found innumerable applications i... more An expeditious development of graph learning in recent years has found innumerable applications in several diversified fields. Of the main associated challenges are the volume and complexity of graph data. The graph learning models suffer from the inability to efficiently learn graph information. In order to indemnify this inefficacy, physics-informed graph learning (PIGL) is emerging. PIGL incorporates physics rules while performing graph learning, which has enormous benefits. This paper presents a systematic review of PIGL methods. We begin with introducing a unified framework of graph learning models followed by examining existing PIGL methods in relation to the unified framework. We also discuss several future challenges for PIGL. This survey paper is expected to stimulate innovative research and development activities pertaining to PIGL.
Computer Science and Information Systems
The outbreak of the COVID-19 pandemic affects lives and social-economic development around the wo... more The outbreak of the COVID-19 pandemic affects lives and social-economic development around the world. The affecting of the pandemic has motivated researchers from different domains to find effective solutions to diagnose, prevent, and estimate the pandemic and relieve its adverse effects. Numerous COVID-19 datasets are built from these studies and are available to the public. These datasets can be used for disease diagnosis and case prediction, speeding up solving problems caused by the pandemic. To meet the needs of researchers to understand various COVID-19 datasets, we examine and provide an overview of them. We organise the majority of these datasets into three categories based on the category of applications, i.e., time-series, knowledge base, and media-based datasets. Organising COVID-19 datasets into appropriate categories can help researchers hold their focus on methodology rather than the datasets. In addition, applications and COVID-19 datasets suffer from a series of prob...
This thesis examines the problem of learning Probabilistic Finite State Machines from text data a... more This thesis examines the problem of learning Probabilistic Finite State Machines from text data and applies it to text classification. Probabilistic Finite State Machines capture regularities and patterns in the text data very effectively and this feature is combined with the ability to compress using the Minimum Message Length principle. Different approaches are developed and are applied on a two-class classification scenario like, classifying spam and non-spam emails on the Enron spam datasets and prediction of individuals in the Activities of Daily Living datasets. The approaches produce significant results and outperform the existing methods of classification.
String matching is to find all the occurrences of a given pattern in a large text, the strings be... more String matching is to find all the occurrences of a given pattern in a large text, the strings being sequence of characters drawn from finite alphabet set. Multiple-Pattern string matching problem involves detection of all the patterns of the Multiple-Pattern set in the text. Shift OR algorithm which we call as the Standard Shift OR algorithm uses the concept of Bit Parallelism to perform approximate string matching. The algorithm as the name suggests performs approximate string matching which means that it finds out some false matches besides detecting correct matches. In other words the algorithm behaves as a filter. In this paper a modification of the standard Shift OR is proposed to improve the filtering efficiency of the standard Shift OR algorithm using the consecutive N-Grams of the patterns of the multiple-pattern set. The proposed method reads N characters of the text at once as compared to a single character in the standard Shift OR algorithm. The number of false matches r...
2019 Cybersecurity and Cyberforensics Conference (CCC), 2019
A Finite State Machine (FSM) is a mathematical model of computation which can effectively model a... more A Finite State Machine (FSM) is a mathematical model of computation which can effectively model a sequence of words or tokens. A grammar representing a collection of tokens in a finite alphabet might contain regularities that are not fully captured by a deterministic formal grammar. Therefore, the simple FSM model is extended to include some probabilistic structure in the grammar which is now termed as Probabilistic Finite State Machine (PFSM). We extend earlier work on inferring PFSMs using the Bayesian informationtheoretic Minimum Message Length (MML) principle to the case of inferring hierarchical PFSMs (HPFSMs). HPFSMs consist of an outer PFSM whose states can internally contain a PFSM (or, recursively, an HPFSM). The alphabet of each such internally contained PFSM can be smaller than the complete HPFSM. HPFSMs can often represent the behaviour of a PFSM more concisely, and MML's ability to deal with both discrete structures and continuous probabilities renders MML well suited to this more general inference. We empirically compare on pseudo-random data-sets.
2021 International Joint Conference on Neural Networks (IJCNN), 2021
Metaphor plays an important role in human communication, which often conveys and evokes sentiment... more Metaphor plays an important role in human communication, which often conveys and evokes sentiments. Numerous approaches to sentiment analysis of metaphors have thus gained attention in natural language processing (NLP). The primary focus of these approaches is on linguistic features and text rather than other modal information and data. However, visual features such as facial expressions also play an important role in expressing sentiments. In this paper, we present a novel neural network approach to sentiment analysis of metaphorical expressions that combines both linguistic and visual features and refer to it as the multimodal model approach. For this, we create a Chinese dataset, containing textual data from metaphorical sentences along with visual data on synchronized facial images. The experimental results indicate that our multimodal model outperforms several other linguistic and visual models, and also outperforms the state-of-the-art methods. The contribution is realized in terms of novelty of the approach and creation of a new, sizeable, and scarce dataset with linguistic and synchronized facial expressive image data. The dataset is particularly useful in languages other than English and the approach addresses one of the most challenging NLP issue: sentiment analysis in metaphor.
2016 Fifth International Conference on Eco-friendly Computing and Communication Systems (ICECCS), 2016
Text classification is the task of assigning predefined categories to text documents. It is a com... more Text classification is the task of assigning predefined categories to text documents. It is a common machine learning problem. Statistical text classification that makes use of machine learning methods to learn classification rules are particularly known to be successful in this regard. In this research project we are trying to re-invent the text classification problem with a sound methodology based on statistical data compression technique-the Minimum Message Length (MML) principle. To model the data sequence we have used the Probabilistic Finite State Automata (PFSAs). We propose two approaches for text classification using the MML-PFSAs. We have tested both the approaches with the Enron spam dataset and the results of our empirical evaluation has been recorded in terms of the well known classification measures i.e. recall, precision, accuracy and error. The results indicate good classification accuracy that can be compared with the state of art classifiers.
2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), 2015
MML (Minimum Message Length) has emerged as a powerful tool in inductive inference of discrete, c... more MML (Minimum Message Length) has emerged as a powerful tool in inductive inference of discrete, continuous and hybrid structures. The Probabilistic Finite State Automaton (PFSA) is one such discrete structure that needs to be inferred for classes of problems in the field of Computer Science including artificial intelligence, pattern recognition and data mining. MML has also served as a viable tool in many classes of problems in the field of Machine Learning including both supervised and unsupervised learning. The classification problem is the most common among them. This research is a twofold solution to a problem where one part focusses on the best inferred PFSA using MML and the second part focusses on the classification problem of Spam Detection. Using the best PFSA inferred in part 1, the Spam Detection theory has been tested using MML on a publicly available Enron Spam dataset. The filter was evaluated on various performance parameters like precision and recall. The evaluation was also done taking into consideration the cost of misclassification in terms of weighted accuracy rate and weighted error rate. The results of our empirical evaluation indicate the classification accuracy to be around 93%, which outperforms well-known established spam filters.
International Journal of Computer Applications, 2013
Spam refers to unsolicited, unwanted and inappropriate bulk email. Spam filtering has become cons... more Spam refers to unsolicited, unwanted and inappropriate bulk email. Spam filtering has become conspicuous as they consume a lot of network bandwidth, overloads the email server and drops the productivity of global economy. Content based spam filtering is accomplished with the help of multiple pattern string matching algorithm. Traditionally Aho Corasick algorithm was used to filter spam which constructs a trie of the spam keywords. The performance degrades in the context of time as well as space as the size of trie increases with the growing spam keywords count. To counterbalance time and space loss, bit parallel multiple pattern string matching algorithm using Shift OR method is used. The method acts as filter performing approximate string matching. This implies that there are some false matches detected by the filter which requires verification. The proposed method for filtering spams has been developed using a combination of Shift AND and OR operation. The method directly works on spam keywords of equal size whereas for unequal size keywords, a new proposed equal size grouping method is developed. Both method shows improvement over the Aho Corasick algorithm in context of space complexity and also behaves as an efficient filter and reducing the number of false matches as present in Shift OR method.
International Journal of Computer Applications, 2013
String matching is to find all the occurrences of a given pattern in a large text both being sequ... more String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. This problem is fundamental in computer Science and is the basic need of many applications such as text retrieval, symbol manipulation, computational biology, data mining, and network security. Bit parallelism method is used for increasing the processing speed of String matching algorithm. Standard Shift OR algorithm is used to perform approximate string matching. The algorithm is a filter which finds out false matches besides detecting correct matches. To improve the efficiency of basic Shift OR algorithm by reducing the number of false matches that is detected along with the correct matches by the algorithm, proposed Shift OR with consecutive q grams has been implemented. In the algorithm instead of reading a single character at a time, it read q characters at once. Extensive experiments have been done with the algorithm and the results are compared with basic version of shift OR algorithms. The number of false matches also reduced considerably. The gain is due to the improved filtering efficiency caused by q-grams.
International Journal of …, 2012
Bit Parallelism exploits bit level parallelism in hardware to perform operations. Bit Parallelism... more Bit Parallelism exploits bit level parallelism in hardware to perform operations. Bit Parallelism is a technique that is used to solve string matching problem, when the pattern to be searched for is less than or equal word size of a system. It is a technique that takes ...
... Fields Vidya SaiKrishna1, Prof. Akhtar Rasool2 and Dr. Nilay Khare3 ... (reduction of search ... more ... Fields Vidya SaiKrishna1, Prof. Akhtar Rasool2 and Dr. Nilay Khare3 ... (reduction of search results set size).These text mining approaches reduce the search result set size via data abstraction techniques, by and large. 5.6 String matching Based Video Retrieval ...
ArXiv, 2022
An expeditious development of graph learning in recent years has found innumerable applications i... more An expeditious development of graph learning in recent years has found innumerable applications in several diversified fields. Of the main associated challenges are the volume and complexity of graph data. A lot of research has been evolving around the preservation of graph data in a low dimensional space. The graph learning models suffer from the inability to maintain original graph information. In order to compensate for this inability, physics-informed graph learning (PIGL) is emerging. PIGL incorporates physics rules while performing graph learning, which enables numerous potentials. This paper presents a systematic review of PIGL methods. We begin with introducing a unified framework of graph learning models, and then examine existing PIGL methods in relation to the unified framework. We also discuss several future challenges for PIGL. This survey paper is expected to stimulate innovative research and development activities pertaining to PIGL.
2022 IEEE International Conference on Data Mining Workshops (ICDMW), Nov 1, 2022
An expeditious development of graph learning in recent years has found innumerable applications i... more An expeditious development of graph learning in recent years has found innumerable applications in several diversified fields. Of the main associated challenges are the volume and complexity of graph data. The graph learning models suffer from the inability to efficiently learn graph information. In order to indemnify this inefficacy, physics-informed graph learning (PIGL) is emerging. PIGL incorporates physics rules while performing graph learning, which has enormous benefits. This paper presents a systematic review of PIGL methods. We begin with introducing a unified framework of graph learning models followed by examining existing PIGL methods in relation to the unified framework. We also discuss several future challenges for PIGL. This survey paper is expected to stimulate innovative research and development activities pertaining to PIGL.
2022 IEEE 19th India Council International Conference (INDICON)
IEEE Transactions on Neural Networks and Learning Systems
String searching algorithms, sometimes called string matching algorithms, are an important class ... more String searching algorithms, sometimes called string matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger string or text.[11] String matching is a classical problem in computer science. In this paper we are trying to explore the various diversified fields where string matching has an eminent role to play and is found as a solution to many problems. Few of the fields exploited are intrusion detection in network, application in bioinformatics, detecting plagiarism, information security, pattern recognition, document matching and text mining. Here we discuss how string matching is found useful in finding solutions to above problems. String matching algorithms can be categorized
2022 IEEE International Conference on Data Mining Workshops (ICDMW)
An expeditious development of graph learning in recent years has found innumerable applications i... more An expeditious development of graph learning in recent years has found innumerable applications in several diversified fields. Of the main associated challenges are the volume and complexity of graph data. The graph learning models suffer from the inability to efficiently learn graph information. In order to indemnify this inefficacy, physics-informed graph learning (PIGL) is emerging. PIGL incorporates physics rules while performing graph learning, which has enormous benefits. This paper presents a systematic review of PIGL methods. We begin with introducing a unified framework of graph learning models followed by examining existing PIGL methods in relation to the unified framework. We also discuss several future challenges for PIGL. This survey paper is expected to stimulate innovative research and development activities pertaining to PIGL.
Computer Science and Information Systems
The outbreak of the COVID-19 pandemic affects lives and social-economic development around the wo... more The outbreak of the COVID-19 pandemic affects lives and social-economic development around the world. The affecting of the pandemic has motivated researchers from different domains to find effective solutions to diagnose, prevent, and estimate the pandemic and relieve its adverse effects. Numerous COVID-19 datasets are built from these studies and are available to the public. These datasets can be used for disease diagnosis and case prediction, speeding up solving problems caused by the pandemic. To meet the needs of researchers to understand various COVID-19 datasets, we examine and provide an overview of them. We organise the majority of these datasets into three categories based on the category of applications, i.e., time-series, knowledge base, and media-based datasets. Organising COVID-19 datasets into appropriate categories can help researchers hold their focus on methodology rather than the datasets. In addition, applications and COVID-19 datasets suffer from a series of prob...
This thesis examines the problem of learning Probabilistic Finite State Machines from text data a... more This thesis examines the problem of learning Probabilistic Finite State Machines from text data and applies it to text classification. Probabilistic Finite State Machines capture regularities and patterns in the text data very effectively and this feature is combined with the ability to compress using the Minimum Message Length principle. Different approaches are developed and are applied on a two-class classification scenario like, classifying spam and non-spam emails on the Enron spam datasets and prediction of individuals in the Activities of Daily Living datasets. The approaches produce significant results and outperform the existing methods of classification.
String matching is to find all the occurrences of a given pattern in a large text, the strings be... more String matching is to find all the occurrences of a given pattern in a large text, the strings being sequence of characters drawn from finite alphabet set. Multiple-Pattern string matching problem involves detection of all the patterns of the Multiple-Pattern set in the text. Shift OR algorithm which we call as the Standard Shift OR algorithm uses the concept of Bit Parallelism to perform approximate string matching. The algorithm as the name suggests performs approximate string matching which means that it finds out some false matches besides detecting correct matches. In other words the algorithm behaves as a filter. In this paper a modification of the standard Shift OR is proposed to improve the filtering efficiency of the standard Shift OR algorithm using the consecutive N-Grams of the patterns of the multiple-pattern set. The proposed method reads N characters of the text at once as compared to a single character in the standard Shift OR algorithm. The number of false matches r...
2019 Cybersecurity and Cyberforensics Conference (CCC), 2019
A Finite State Machine (FSM) is a mathematical model of computation which can effectively model a... more A Finite State Machine (FSM) is a mathematical model of computation which can effectively model a sequence of words or tokens. A grammar representing a collection of tokens in a finite alphabet might contain regularities that are not fully captured by a deterministic formal grammar. Therefore, the simple FSM model is extended to include some probabilistic structure in the grammar which is now termed as Probabilistic Finite State Machine (PFSM). We extend earlier work on inferring PFSMs using the Bayesian informationtheoretic Minimum Message Length (MML) principle to the case of inferring hierarchical PFSMs (HPFSMs). HPFSMs consist of an outer PFSM whose states can internally contain a PFSM (or, recursively, an HPFSM). The alphabet of each such internally contained PFSM can be smaller than the complete HPFSM. HPFSMs can often represent the behaviour of a PFSM more concisely, and MML's ability to deal with both discrete structures and continuous probabilities renders MML well suited to this more general inference. We empirically compare on pseudo-random data-sets.
2021 International Joint Conference on Neural Networks (IJCNN), 2021
Metaphor plays an important role in human communication, which often conveys and evokes sentiment... more Metaphor plays an important role in human communication, which often conveys and evokes sentiments. Numerous approaches to sentiment analysis of metaphors have thus gained attention in natural language processing (NLP). The primary focus of these approaches is on linguistic features and text rather than other modal information and data. However, visual features such as facial expressions also play an important role in expressing sentiments. In this paper, we present a novel neural network approach to sentiment analysis of metaphorical expressions that combines both linguistic and visual features and refer to it as the multimodal model approach. For this, we create a Chinese dataset, containing textual data from metaphorical sentences along with visual data on synchronized facial images. The experimental results indicate that our multimodal model outperforms several other linguistic and visual models, and also outperforms the state-of-the-art methods. The contribution is realized in terms of novelty of the approach and creation of a new, sizeable, and scarce dataset with linguistic and synchronized facial expressive image data. The dataset is particularly useful in languages other than English and the approach addresses one of the most challenging NLP issue: sentiment analysis in metaphor.
2016 Fifth International Conference on Eco-friendly Computing and Communication Systems (ICECCS), 2016
Text classification is the task of assigning predefined categories to text documents. It is a com... more Text classification is the task of assigning predefined categories to text documents. It is a common machine learning problem. Statistical text classification that makes use of machine learning methods to learn classification rules are particularly known to be successful in this regard. In this research project we are trying to re-invent the text classification problem with a sound methodology based on statistical data compression technique-the Minimum Message Length (MML) principle. To model the data sequence we have used the Probabilistic Finite State Automata (PFSAs). We propose two approaches for text classification using the MML-PFSAs. We have tested both the approaches with the Enron spam dataset and the results of our empirical evaluation has been recorded in terms of the well known classification measures i.e. recall, precision, accuracy and error. The results indicate good classification accuracy that can be compared with the state of art classifiers.
2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), 2015
MML (Minimum Message Length) has emerged as a powerful tool in inductive inference of discrete, c... more MML (Minimum Message Length) has emerged as a powerful tool in inductive inference of discrete, continuous and hybrid structures. The Probabilistic Finite State Automaton (PFSA) is one such discrete structure that needs to be inferred for classes of problems in the field of Computer Science including artificial intelligence, pattern recognition and data mining. MML has also served as a viable tool in many classes of problems in the field of Machine Learning including both supervised and unsupervised learning. The classification problem is the most common among them. This research is a twofold solution to a problem where one part focusses on the best inferred PFSA using MML and the second part focusses on the classification problem of Spam Detection. Using the best PFSA inferred in part 1, the Spam Detection theory has been tested using MML on a publicly available Enron Spam dataset. The filter was evaluated on various performance parameters like precision and recall. The evaluation was also done taking into consideration the cost of misclassification in terms of weighted accuracy rate and weighted error rate. The results of our empirical evaluation indicate the classification accuracy to be around 93%, which outperforms well-known established spam filters.
International Journal of Computer Applications, 2013
Spam refers to unsolicited, unwanted and inappropriate bulk email. Spam filtering has become cons... more Spam refers to unsolicited, unwanted and inappropriate bulk email. Spam filtering has become conspicuous as they consume a lot of network bandwidth, overloads the email server and drops the productivity of global economy. Content based spam filtering is accomplished with the help of multiple pattern string matching algorithm. Traditionally Aho Corasick algorithm was used to filter spam which constructs a trie of the spam keywords. The performance degrades in the context of time as well as space as the size of trie increases with the growing spam keywords count. To counterbalance time and space loss, bit parallel multiple pattern string matching algorithm using Shift OR method is used. The method acts as filter performing approximate string matching. This implies that there are some false matches detected by the filter which requires verification. The proposed method for filtering spams has been developed using a combination of Shift AND and OR operation. The method directly works on spam keywords of equal size whereas for unequal size keywords, a new proposed equal size grouping method is developed. Both method shows improvement over the Aho Corasick algorithm in context of space complexity and also behaves as an efficient filter and reducing the number of false matches as present in Shift OR method.
International Journal of Computer Applications, 2013
String matching is to find all the occurrences of a given pattern in a large text both being sequ... more String matching is to find all the occurrences of a given pattern in a large text both being sequence of characters drawn from finite alphabet set. This problem is fundamental in computer Science and is the basic need of many applications such as text retrieval, symbol manipulation, computational biology, data mining, and network security. Bit parallelism method is used for increasing the processing speed of String matching algorithm. Standard Shift OR algorithm is used to perform approximate string matching. The algorithm is a filter which finds out false matches besides detecting correct matches. To improve the efficiency of basic Shift OR algorithm by reducing the number of false matches that is detected along with the correct matches by the algorithm, proposed Shift OR with consecutive q grams has been implemented. In the algorithm instead of reading a single character at a time, it read q characters at once. Extensive experiments have been done with the algorithm and the results are compared with basic version of shift OR algorithms. The number of false matches also reduced considerably. The gain is due to the improved filtering efficiency caused by q-grams.
International Journal of …, 2012
Bit Parallelism exploits bit level parallelism in hardware to perform operations. Bit Parallelism... more Bit Parallelism exploits bit level parallelism in hardware to perform operations. Bit Parallelism is a technique that is used to solve string matching problem, when the pattern to be searched for is less than or equal word size of a system. It is a technique that takes ...
... Fields Vidya SaiKrishna1, Prof. Akhtar Rasool2 and Dr. Nilay Khare3 ... (reduction of search ... more ... Fields Vidya SaiKrishna1, Prof. Akhtar Rasool2 and Dr. Nilay Khare3 ... (reduction of search results set size).These text mining approaches reduce the search result set size via data abstraction techniques, by and large. 5.6 String matching Based Video Retrieval ...
ArXiv, 2022
An expeditious development of graph learning in recent years has found innumerable applications i... more An expeditious development of graph learning in recent years has found innumerable applications in several diversified fields. Of the main associated challenges are the volume and complexity of graph data. A lot of research has been evolving around the preservation of graph data in a low dimensional space. The graph learning models suffer from the inability to maintain original graph information. In order to compensate for this inability, physics-informed graph learning (PIGL) is emerging. PIGL incorporates physics rules while performing graph learning, which enables numerous potentials. This paper presents a systematic review of PIGL methods. We begin with introducing a unified framework of graph learning models, and then examine existing PIGL methods in relation to the unified framework. We also discuss several future challenges for PIGL. This survey paper is expected to stimulate innovative research and development activities pertaining to PIGL.