Vijay Raghavan | University of Louisiana at Lafayette (original) (raw)

Papers by Vijay Raghavan

Research paper thumbnail of Association mining based approach to analyze COVID-19 response and case growth in the United States

Scientific Reports, 2021

Containing the COVID-19 pandemic while balancing the economy has proven to be quite a challenge f... more Containing the COVID-19 pandemic while balancing the economy has proven to be quite a challenge for the world. We still have limited understanding of which combination of policies have been most effective in flattening the curve; given the challenges of the dynamic and evolving nature of the pandemic, lack of quality data etc. This paper introduces a novel data mining-based approach to understand the effects of different non-pharmaceutical interventions in containing the COVID-19 infection rate. We used the association rule mining approach to perform descriptive data mining on publicly available data for 50 states in the United States to understand the similarity and differences among various policies and underlying conditions that led to transitions between different infection growth curve phases. We used a multi-peak logistic growth model to label the different phases of infection growth curve. The common trends in the data were analyzed with respect to lockdowns, face mask mandat...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Visual Analytics of Time Evolving Large-scale Graphs

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery

Frontiers in Chemistry, 2021

Development of protein 3-D structural comparison methods is important in understanding protein fu... more Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across protei...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Information Retrieval

Chapman and Hall/CRC eBooks, Sep 29, 2004

Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examinat... more Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examination: Oral exam at the end of the semester Outline Motivation and Overview Text processing and analysis Link Analysis and Authority Ranking information Top-K Query Processing and Indexing search Advanced IR Models Multimedia Retrieval Automatic Classification Clustering and Graph Mining Peer-to-Peer Technologies information Information Extraction organization Data Warehouses and OLAP Ontologies and Semantic Web

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Cognitive Computing: Theory and Applications, Volume 35

Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuse... more Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuses on cognitive computing and its theory and applications, including the use of cognitive computing to manage renewable energy, the environment, and other scarce resources, machine learning models and algorithms, biometrics, Kernel Based Models for transductive learning, neural networks, graph analytics in cyber security, neural networks, data driven speech recognition, and analytical platforms to study the brain-computer interface. Comprehensively presents the various aspects of statistical methodologyDiscusses a wide variety of diverse applications and recent developmentsContributors are internationally renowned experts in their respective areas

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Data structures for selective association mining

Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing ... more Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing within the dataset. However, only a very small fraction of this massive volume of frequent itemsets is interesting to the user. Therefore, such algorithms waste a lot of time and resources to uncover itemsets that are insignificant. The objective of this dissertation is to introduce data structure that can be used to support selective association mining, which is an association mining algorithm that generates only itemsets containing items of user interest. The first data structure introduced for selective association mining is itemset tree. The performance of itemset tree can be improved by reordering the items. Five different distributions are used to determine which of these distributions performed the best. Two of these distributions are extracted from the structure of a clustering algorithm known as UNIMEM. As the performance of the distributions from UNIMEM are evaluated, UNIMEM shows potential for selective association mining. However, the algorithm cannot be used directly and our proposed modified version of the conceptual tree of UNIMEM is called ISE-Tree. Experiments show that ISE-Tree performs better than itemset tree and we have successfully introduced a new improved data structure for selective association mining.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Identifying Minimum-Sized Influential Vertices on Large-Scale Weighted Graphs: A Big Data Perspective

Weighted graphs can be used to model any data sets composed of entities and relationships. Social... more Weighted graphs can be used to model any data sets composed of entities and relationships. Social networks, concept networks, and document networks are among the types of data that can be abstracted as weighted graphs. Identifying minimum-sized influential vertices (MIV) in a weighted graph is an important task in graph mining that gains valuable commercial applications. Although different algorithms for this task have been proposed, it remains challenging for processing web-scale weighted graph. In this chapter, we propose a highly scalable algorithm for identifying MIV on large-scale weighted graph using the MapReduce framework. The proposed algorithm starts with identifying an individual zone for every vertex in the graph using an α-cut fuzzy set. This approximation allows to divide the whole graph into multiple subgraphs that can be processed independently. Then, for each subgraph, a MapReduce-based greedy algorithm can be designed to identify the minimum-sized influential vertices for the whole graph.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Analyzing Structure of Terrorist Networks by Using Graph Metrics

All criminal networks are social networks with multiple channels of communication and collaborati... more All criminal networks are social networks with multiple channels of communication and collaboration between their members. In this paper, we analyze different types of criminal networks with respect to metrics commonly used in social network analysis literature. We focus mostly on two types of networks: cocaine trading and terrorist activities. We also include a legal organization's network for comparison. Our findings reveal that there are significant differences in terms of some of these metrics between different types of criminal networks. These differences, in turn, may help security forces to identify unknown networks or substructures in very large networks as potential criminal networks.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An enhancement of Boolean retrieval systems based on term co-occurrence frequencies / Un amélioration des systèmes d’information basé sur la cooccurrence de termes

Proceedings of the annual conference of CAIS, Mar 26, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An Ontology-Based Architecture for Providing Insights in Wireless Networks Domain

Ontology-based approaches have been explored in several domains for knowledge representation and ... more Ontology-based approaches have been explored in several domains for knowledge representation and improving accuracy. However, ontology-based approaches for assisting a decision maker by delivering a concrete plan from analyzing the insights extracted from an ontology, have not received much attention. Insights-as-a-service is a technology that aids a decision maker by providing a concrete action plan, involving a comparative analysis of patterns derived from the data and the extraction of insights from such an analysis. In this paper, we propose an ontology-based architecture for mining insights within the Wireless Network Ontology (WNO), an ontology generated for the wireless network domain for delivering better wireless network performance. We present and illustrate: (i) the major components of the architecture together with the algorithms used for summarizing the network performance profiles in the form of rank tables, and (ii) how the insight rules (the action plan) are extracted from these tables. By utilizing the proposed approach, an actionable plan for assisting the decision maker can be obtained as domain knowledge is incorporated in the system. Experimental results on a wireless network dataset show that the proposed model provides an optimal action plan for a wireless network to improve its performance by encoding data-driven rules into the ontology and suggesting changes to its current network configuration.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Sub-event detection from tweets

Social media plays an important role in communication between people in recent times. This includ... more Social media plays an important role in communication between people in recent times. This includes information about news and events that are currently happening. Most of the research on event detection concentrates on identifying events from social media information. These models assume an event to be a single entity and treat it as such during the detection process. This assumption ignores that the composition of an event changes as new information is made available on social media. To capture the change in information over time, we extend an already existing Event Detection at Onset algorithm to study the evolution of an event over time. We introduce the concept of an event life cycle model that tracks various key events in the evolution of an event. The proposed unsupervised sub-event detection method uses a threshold-based approach to identify relationships between sub-events over time. These related events are mapped to an event life cycle to identify sub-events. We evaluate the proposed sub-event detection approach on a large-scale Twitter corpus.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Automatic and Semi-Automatic Techniques for Image Annotation

IGI Global eBooks, Jan 18, 2011

When retrieving images, users may find it easier to express the desired semantic content with key... more When retrieving images, users may find it easier to express the desired semantic content with keywords than visual features. Accurate keyword retrieval can only occur when images are completely and accurately described. This can be achieved either through laborious manual effort or ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Deep Similarity-Enhanced K Nearest Neighbors

The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning ta... more The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we propose a novel deep learning architecture, which is called the Deep Similarity-Enhanced K Nearest Neighbors (DSE-KNN), to learn an optimized similarity function of the data directly towards the goal of optimizing the KNN decision making. In other words, the type of similarity function that is used in our method is not pre-determined but rather learned to map data to a high-dimensional feature space where the accuracy of the KNN decision making is maximized. Experimental results show that DSE-KNN outperforms other common machine learning methods on classifying different types of disease datasets and predicting daily price direction of different stock ETFs.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data

arXiv (Cornell University), Jun 2, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Intrusion Detection Using Payload Embeddings

IEEE Access, 2021

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Online mining for association rules and collective anomalies in data streams

2017 IEEE International Conference on Big Data (Big Data), 2017

When analyzing streaming data, the results can depreciate in value faster than the analysis can b... more When analyzing streaming data, the results can depreciate in value faster than the analysis can be completed and results deployed. This is certainly the case in the area of anomaly detection, where detecting a potential problem as it is occurring (or in the early stages) can permit corrective behavior. However, most anomaly detection methods focus on point anomalies, whilst many fraudulent behaviors could be detected only through collective analysis of sequences of data in practice. Moreover, anomaly detection systems often stop at detecting anomalies; they typically do not provide information about how the features (attributes) of anomalies relate to each other or to those in normal states. The goal of this research is to create a distributed system that allows for the detection of collective anomalies from streaming data, and to provide a richer context of information about the anomalies besides their presence. To accomplish this, we (a) re-engineered an online sequence anomaly detection algorithm and (b) designed new algorithms for targeted association mining to run on a streaming, distributed environment. Our experiments, conducted on both synthetic and real-world data sets, demonstrated that the proposed framework is able to achieve near real-time response in detecting anomalies and extracting information pertaining to the anomalies.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Impact of data mining on database security

Digitized resources are growing at a rapid pace. One of the challenges facing the computer scienc... more Digitized resources are growing at a rapid pace. One of the challenges facing the computer science community is the development of techniques and tools to discover new and useful information from large collections of data. There are a number of basic issues associated with this challenge and many are still unresolved. This situation has led to the emergence of a new area of study called "Knowledge Discovery in Databases" (KDD). KDD is comprised of researchers from a variety of fields, including statistics, pattern recognition, artificial intelligence, machine learning and databases. Recent efforts of KDD researchers have focused primarily on issues surrounding the individual steps of the discovery process. Those issues not directly related to the discovery process have received much less attention. One such issue is the impact of this new technology on database security. In particular, the security threat presented by classification learning methods. Providing safeguards a...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Itemset Trees for Targeted

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Divergence Based Non-Negative Matrix Factorization for top-N Recommendations

Proceedings of the 52nd Hawaii International Conference on System Sciences, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Detection of event onset using Twitter

2016 International Joint Conference on Neural Networks (IJCNN), 2016

Social Media generates information about news and events in real-time. Given the vast amount of d... more Social Media generates information about news and events in real-time. Given the vast amount of data available and the rate of information propagation, reliably identifying events is a challenge. Most state-of-the-art techniques are post hoc techniques that detect an event after it happened. Our goal is to detect onset of an event as it is happening using the user-generated information from Twitter streams. To achieve this goal, we use a discriminative model to identify change in the pattern of conversations over time. We use a topic evolution model to find credible events and eliminate random noise that is prevalent in many of the event detection models. The simplicity of the proposed model allows detect events quickly and efficiently, permitting discovery of events within minutes from the start of conversation about those conversations on Twitter. Our model is evaluated on a large-scale Twitter corpus to detect events in real-time. The proposed model is tested on other datasets to detect change over longer periods of time. The results indicate we can detect real events, within 3 to 8 minutes of it first appearing, with a lower degree of noise compared to other methods.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Association mining based approach to analyze COVID-19 response and case growth in the United States

Scientific Reports, 2021

Containing the COVID-19 pandemic while balancing the economy has proven to be quite a challenge f... more Containing the COVID-19 pandemic while balancing the economy has proven to be quite a challenge for the world. We still have limited understanding of which combination of policies have been most effective in flattening the curve; given the challenges of the dynamic and evolving nature of the pandemic, lack of quality data etc. This paper introduces a novel data mining-based approach to understand the effects of different non-pharmaceutical interventions in containing the COVID-19 infection rate. We used the association rule mining approach to perform descriptive data mining on publicly available data for 50 states in the United States to understand the similarity and differences among various policies and underlying conditions that led to transitions between different infection growth curve phases. We used a multi-peak logistic growth model to label the different phases of infection growth curve. The common trends in the data were analyzed with respect to lockdowns, face mask mandat...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Visual Analytics of Time Evolving Large-scale Graphs

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Development of a TSR-Based Method for Protein 3-D Structural Comparison With Its Applications to Protein Classification and Motif Discovery

Frontiers in Chemistry, 2021

Development of protein 3-D structural comparison methods is important in understanding protein fu... more Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across protei...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Information Retrieval

Chapman and Hall/CRC eBooks, Sep 29, 2004

Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examinat... more Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examination: Oral exam at the end of the semester Outline Motivation and Overview Text processing and analysis Link Analysis and Authority Ranking information Top-K Query Processing and Indexing search Advanced IR Models Multimedia Retrieval Automatic Classification Clustering and Graph Mining Peer-to-Peer Technologies information Information Extraction organization Data Warehouses and OLAP Ontologies and Semantic Web

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Cognitive Computing: Theory and Applications, Volume 35

Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuse... more Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuses on cognitive computing and its theory and applications, including the use of cognitive computing to manage renewable energy, the environment, and other scarce resources, machine learning models and algorithms, biometrics, Kernel Based Models for transductive learning, neural networks, graph analytics in cyber security, neural networks, data driven speech recognition, and analytical platforms to study the brain-computer interface. Comprehensively presents the various aspects of statistical methodologyDiscusses a wide variety of diverse applications and recent developmentsContributors are internationally renowned experts in their respective areas

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Data structures for selective association mining

Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing ... more Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing within the dataset. However, only a very small fraction of this massive volume of frequent itemsets is interesting to the user. Therefore, such algorithms waste a lot of time and resources to uncover itemsets that are insignificant. The objective of this dissertation is to introduce data structure that can be used to support selective association mining, which is an association mining algorithm that generates only itemsets containing items of user interest. The first data structure introduced for selective association mining is itemset tree. The performance of itemset tree can be improved by reordering the items. Five different distributions are used to determine which of these distributions performed the best. Two of these distributions are extracted from the structure of a clustering algorithm known as UNIMEM. As the performance of the distributions from UNIMEM are evaluated, UNIMEM shows potential for selective association mining. However, the algorithm cannot be used directly and our proposed modified version of the conceptual tree of UNIMEM is called ISE-Tree. Experiments show that ISE-Tree performs better than itemset tree and we have successfully introduced a new improved data structure for selective association mining.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Identifying Minimum-Sized Influential Vertices on Large-Scale Weighted Graphs: A Big Data Perspective

Weighted graphs can be used to model any data sets composed of entities and relationships. Social... more Weighted graphs can be used to model any data sets composed of entities and relationships. Social networks, concept networks, and document networks are among the types of data that can be abstracted as weighted graphs. Identifying minimum-sized influential vertices (MIV) in a weighted graph is an important task in graph mining that gains valuable commercial applications. Although different algorithms for this task have been proposed, it remains challenging for processing web-scale weighted graph. In this chapter, we propose a highly scalable algorithm for identifying MIV on large-scale weighted graph using the MapReduce framework. The proposed algorithm starts with identifying an individual zone for every vertex in the graph using an α-cut fuzzy set. This approximation allows to divide the whole graph into multiple subgraphs that can be processed independently. Then, for each subgraph, a MapReduce-based greedy algorithm can be designed to identify the minimum-sized influential vertices for the whole graph.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Analyzing Structure of Terrorist Networks by Using Graph Metrics

All criminal networks are social networks with multiple channels of communication and collaborati... more All criminal networks are social networks with multiple channels of communication and collaboration between their members. In this paper, we analyze different types of criminal networks with respect to metrics commonly used in social network analysis literature. We focus mostly on two types of networks: cocaine trading and terrorist activities. We also include a legal organization's network for comparison. Our findings reveal that there are significant differences in terms of some of these metrics between different types of criminal networks. These differences, in turn, may help security forces to identify unknown networks or substructures in very large networks as potential criminal networks.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An enhancement of Boolean retrieval systems based on term co-occurrence frequencies / Un amélioration des systèmes d’information basé sur la cooccurrence de termes

Proceedings of the annual conference of CAIS, Mar 26, 2022

Bookmarks Related papers MentionsView impact

Research paper thumbnail of An Ontology-Based Architecture for Providing Insights in Wireless Networks Domain

Ontology-based approaches have been explored in several domains for knowledge representation and ... more Ontology-based approaches have been explored in several domains for knowledge representation and improving accuracy. However, ontology-based approaches for assisting a decision maker by delivering a concrete plan from analyzing the insights extracted from an ontology, have not received much attention. Insights-as-a-service is a technology that aids a decision maker by providing a concrete action plan, involving a comparative analysis of patterns derived from the data and the extraction of insights from such an analysis. In this paper, we propose an ontology-based architecture for mining insights within the Wireless Network Ontology (WNO), an ontology generated for the wireless network domain for delivering better wireless network performance. We present and illustrate: (i) the major components of the architecture together with the algorithms used for summarizing the network performance profiles in the form of rank tables, and (ii) how the insight rules (the action plan) are extracted from these tables. By utilizing the proposed approach, an actionable plan for assisting the decision maker can be obtained as domain knowledge is incorporated in the system. Experimental results on a wireless network dataset show that the proposed model provides an optimal action plan for a wireless network to improve its performance by encoding data-driven rules into the ontology and suggesting changes to its current network configuration.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Sub-event detection from tweets

Social media plays an important role in communication between people in recent times. This includ... more Social media plays an important role in communication between people in recent times. This includes information about news and events that are currently happening. Most of the research on event detection concentrates on identifying events from social media information. These models assume an event to be a single entity and treat it as such during the detection process. This assumption ignores that the composition of an event changes as new information is made available on social media. To capture the change in information over time, we extend an already existing Event Detection at Onset algorithm to study the evolution of an event over time. We introduce the concept of an event life cycle model that tracks various key events in the evolution of an event. The proposed unsupervised sub-event detection method uses a threshold-based approach to identify relationships between sub-events over time. These related events are mapped to an event life cycle to identify sub-events. We evaluate the proposed sub-event detection approach on a large-scale Twitter corpus.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Automatic and Semi-Automatic Techniques for Image Annotation

IGI Global eBooks, Jan 18, 2011

When retrieving images, users may find it easier to express the desired semantic content with key... more When retrieving images, users may find it easier to express the desired semantic content with keywords than visual features. Accurate keyword retrieval can only occur when images are completely and accurately described. This can be achieved either through laborious manual effort or ...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Deep Similarity-Enhanced K Nearest Neighbors

The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning ta... more The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we propose a novel deep learning architecture, which is called the Deep Similarity-Enhanced K Nearest Neighbors (DSE-KNN), to learn an optimized similarity function of the data directly towards the goal of optimizing the KNN decision making. In other words, the type of similarity function that is used in our method is not pre-determined but rather learned to map data to a high-dimensional feature space where the accuracy of the KNN decision making is maximized. Experimental results show that DSE-KNN outperforms other common machine learning methods on classifying different types of disease datasets and predicting daily price direction of different stock ETFs.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data

arXiv (Cornell University), Jun 2, 2017

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Intrusion Detection Using Payload Embeddings

IEEE Access, 2021

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Online mining for association rules and collective anomalies in data streams

2017 IEEE International Conference on Big Data (Big Data), 2017

When analyzing streaming data, the results can depreciate in value faster than the analysis can b... more When analyzing streaming data, the results can depreciate in value faster than the analysis can be completed and results deployed. This is certainly the case in the area of anomaly detection, where detecting a potential problem as it is occurring (or in the early stages) can permit corrective behavior. However, most anomaly detection methods focus on point anomalies, whilst many fraudulent behaviors could be detected only through collective analysis of sequences of data in practice. Moreover, anomaly detection systems often stop at detecting anomalies; they typically do not provide information about how the features (attributes) of anomalies relate to each other or to those in normal states. The goal of this research is to create a distributed system that allows for the detection of collective anomalies from streaming data, and to provide a richer context of information about the anomalies besides their presence. To accomplish this, we (a) re-engineered an online sequence anomaly detection algorithm and (b) designed new algorithms for targeted association mining to run on a streaming, distributed environment. Our experiments, conducted on both synthetic and real-world data sets, demonstrated that the proposed framework is able to achieve near real-time response in detecting anomalies and extracting information pertaining to the anomalies.

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Impact of data mining on database security

Digitized resources are growing at a rapid pace. One of the challenges facing the computer scienc... more Digitized resources are growing at a rapid pace. One of the challenges facing the computer science community is the development of techniques and tools to discover new and useful information from large collections of data. There are a number of basic issues associated with this challenge and many are still unresolved. This situation has led to the emergence of a new area of study called "Knowledge Discovery in Databases" (KDD). KDD is comprised of researchers from a variety of fields, including statistics, pattern recognition, artificial intelligence, machine learning and databases. Recent efforts of KDD researchers have focused primarily on issues surrounding the individual steps of the discovery process. Those issues not directly related to the discovery process have received much less attention. One such issue is the impact of this new technology on database security. In particular, the security threat presented by classification learning methods. Providing safeguards a...

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Itemset Trees for Targeted

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Divergence Based Non-Negative Matrix Factorization for top-N Recommendations

Proceedings of the 52nd Hawaii International Conference on System Sciences, 2019

Bookmarks Related papers MentionsView impact

Research paper thumbnail of Detection of event onset using Twitter

2016 International Joint Conference on Neural Networks (IJCNN), 2016

Social Media generates information about news and events in real-time. Given the vast amount of d... more Social Media generates information about news and events in real-time. Given the vast amount of data available and the rate of information propagation, reliably identifying events is a challenge. Most state-of-the-art techniques are post hoc techniques that detect an event after it happened. Our goal is to detect onset of an event as it is happening using the user-generated information from Twitter streams. To achieve this goal, we use a discriminative model to identify change in the pattern of conversations over time. We use a topic evolution model to find credible events and eliminate random noise that is prevalent in many of the event detection models. The simplicity of the proposed model allows detect events quickly and efficiently, permitting discovery of events within minutes from the start of conversation about those conversations on Twitter. Our model is evaluated on a large-scale Twitter corpus to detect events in real-time. The proposed model is tested on other datasets to detect change over longer periods of time. The results indicate we can detect real events, within 3 to 8 minutes of it first appearing, with a lower degree of noise compared to other methods.

Bookmarks Related papers MentionsView impact