Vijay Raghavan - Profile on Academia.edu (original) (raw)

Papers by Vijay Raghavan

Scientific Reports, 2021

Containing the COVID-19 pandemic while balancing the economy has proven to be quite a challenge f... more Containing the COVID-19 pandemic while balancing the economy has proven to be quite a challenge for the world. We still have limited understanding of which combination of policies have been most effective in flattening the curve; given the challenges of the dynamic and evolving nature of the pandemic, lack of quality data etc. This paper introduces a novel data mining-based approach to understand the effects of different non-pharmaceutical interventions in containing the COVID-19 infection rate. We used the association rule mining approach to perform descriptive data mining on publicly available data for 50 states in the United States to understand the similarity and differences among various policies and underlying conditions that led to transitions between different infection growth curve phases. We used a multi-peak logistic growth model to label the different phases of infection growth curve. The common trends in the data were analyzed with respect to lockdowns, face mask mandat...

Frontiers in Chemistry, 2021

Development of protein 3-D structural comparison methods is important in understanding protein fu... more Development of protein 3-D structural comparison methods is important in understanding protein functions. At the same time, developing such a method is very challenging. In the last 40 years, ever since the development of the first automated structural method, ~200 papers were published using different representations of structures. The existing methods can be divided into five categories: sequence-, distance-, secondary structure-, geometry-based, and network-based structural comparisons. Each has its uniqueness, but also limitations. We have developed a novel method where the 3-D structure of a protein is modeled using the concept of Triangular Spatial Relationship (TSR), where triangles are constructed with the Cα atoms of a protein as vertices. Every triangle is represented using an integer, which we denote as “key,” A key is computed using the length, angle, and vertex labels based on a rule-based formula, which ensures assignment of the same key to identical TSRs across protei...

Information Retrieval

Chapman and Hall/CRC eBooks, Sep 29, 2004

Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examinat... more Lectures: Monday 14-16 in E-523 Course materials: http://isweb.uni-koblenz.de (teaching) Examination: Oral exam at the end of the semester Outline Motivation and Overview Text processing and analysis Link Analysis and Authority Ranking information Top-K Query Processing and Indexing search Advanced IR Models Multimedia Retrieval Automatic Classification Clustering and Graph Mining Peer-to-Peer Technologies information Information Extraction organization Data Warehouses and OLAP Ontologies and Semantic Web

Cognitive Computing: Theory and Applications, Volume 35

Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuse... more Cognitive Computing: Theory and Applications, written by internationally renowned experts, focuses on cognitive computing and its theory and applications, including the use of cognitive computing to manage renewable energy, the environment, and other scarce resources, machine learning models and algorithms, biometrics, Kernel Based Models for transductive learning, neural networks, graph analytics in cyber security, neural networks, data driven speech recognition, and analytical platforms to study the brain-computer interface. Comprehensively presents the various aspects of statistical methodologyDiscusses a wide variety of diverse applications and recent developmentsContributors are internationally renowned experts in their respective areas

Data structures for selective association mining

Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing ... more Traditional association mining algorithms, like Apriori, generate all frequent itemsets existing within the dataset. However, only a very small fraction of this massive volume of frequent itemsets is interesting to the user. Therefore, such algorithms waste a lot of time and resources to uncover itemsets that are insignificant. The objective of this dissertation is to introduce data structure that can be used to support selective association mining, which is an association mining algorithm that generates only itemsets containing items of user interest. The first data structure introduced for selective association mining is itemset tree. The performance of itemset tree can be improved by reordering the items. Five different distributions are used to determine which of these distributions performed the best. Two of these distributions are extracted from the structure of a clustering algorithm known as UNIMEM. As the performance of the distributions from UNIMEM are evaluated, UNIMEM shows potential for selective association mining. However, the algorithm cannot be used directly and our proposed modified version of the conceptual tree of UNIMEM is called ISE-Tree. Experiments show that ISE-Tree performs better than itemset tree and we have successfully introduced a new improved data structure for selective association mining.

Identifying Minimum-Sized Influential Vertices on Large-Scale Weighted Graphs: A Big Data Perspective

Weighted graphs can be used to model any data sets composed of entities and relationships. Social... more Weighted graphs can be used to model any data sets composed of entities and relationships. Social networks, concept networks, and document networks are among the types of data that can be abstracted as weighted graphs. Identifying minimum-sized influential vertices (MIV) in a weighted graph is an important task in graph mining that gains valuable commercial applications. Although different algorithms for this task have been proposed, it remains challenging for processing web-scale weighted graph. In this chapter, we propose a highly scalable algorithm for identifying MIV on large-scale weighted graph using the MapReduce framework. The proposed algorithm starts with identifying an individual zone for every vertex in the graph using an α-cut fuzzy set. This approximation allows to divide the whole graph into multiple subgraphs that can be processed independently. Then, for each subgraph, a MapReduce-based greedy algorithm can be designed to identify the minimum-sized influential vertices for the whole graph.

Analyzing Structure of Terrorist Networks by Using Graph Metrics

All criminal networks are social networks with multiple channels of communication and collaborati... more All criminal networks are social networks with multiple channels of communication and collaboration between their members. In this paper, we analyze different types of criminal networks with respect to metrics commonly used in social network analysis literature. We focus mostly on two types of networks: cocaine trading and terrorist activities. We also include a legal organization's network for comparison. Our findings reveal that there are significant differences in terms of some of these metrics between different types of criminal networks. These differences, in turn, may help security forces to identify unknown networks or substructures in very large networks as potential criminal networks.

Proceedings of the annual conference of CAIS, Mar 26, 2022

Most commercial systems for information retrieval employ the standard Boolean retrieval strategy ... more Most commercial systems for information retrieval employ the standard Boolean retrieval strategy for providing response to user queries. One of the major problems in this context is the inability of such systems to provide ranked output. Furthermore, there have not been very many studies that attempt to incorporate dependencies between the index terms into the retrieval process. In this presentation, a scheme is proposed by which the Boolean retrieval strategy can be enhanced by using dependencies based on term co-occurrence frequencies. Experiments performed on two experimental collections demonstrate that the retrieval performance of the proposed scheme is significantly better than the standard approach. If the preprocessing time required to determine the amount of term-term dependencies is ignored, then the computing time to process a query in the proposed environment is of 0(1); i.e. it is independent of the number of terms in the query.

An Ontology-Based Architecture for Providing Insights in Wireless Networks Domain

Ontology-based approaches have been explored in several domains for knowledge representation and ... more Ontology-based approaches have been explored in several domains for knowledge representation and improving accuracy. However, ontology-based approaches for assisting a decision maker by delivering a concrete plan from analyzing the insights extracted from an ontology, have not received much attention. Insights-as-a-service is a technology that aids a decision maker by providing a concrete action plan, involving a comparative analysis of patterns derived from the data and the extraction of insights from such an analysis. In this paper, we propose an ontology-based architecture for mining insights within the Wireless Network Ontology (WNO), an ontology generated for the wireless network domain for delivering better wireless network performance. We present and illustrate: (i) the major components of the architecture together with the algorithms used for summarizing the network performance profiles in the form of rank tables, and (ii) how the insight rules (the action plan) are extracted from these tables. By utilizing the proposed approach, an actionable plan for assisting the decision maker can be obtained as domain knowledge is incorporated in the system. Experimental results on a wireless network dataset show that the proposed model provides an optimal action plan for a wireless network to improve its performance by encoding data-driven rules into the ontology and suggesting changes to its current network configuration.

Sub-event detection from tweets

Social media plays an important role in communication between people in recent times. This includ... more Social media plays an important role in communication between people in recent times. This includes information about news and events that are currently happening. Most of the research on event detection concentrates on identifying events from social media information. These models assume an event to be a single entity and treat it as such during the detection process. This assumption ignores that the composition of an event changes as new information is made available on social media. To capture the change in information over time, we extend an already existing Event Detection at Onset algorithm to study the evolution of an event over time. We introduce the concept of an event life cycle model that tracks various key events in the evolution of an event. The proposed unsupervised sub-event detection method uses a threshold-based approach to identify relationships between sub-events over time. These related events are mapped to an event life cycle to identify sub-events. We evaluate the proposed sub-event detection approach on a large-scale Twitter corpus.

Automatic and Semi-Automatic Techniques for Image Annotation

IGI Global eBooks, Jan 18, 2011

When retrieving images, users may find it easier to express the desired semantic content with key... more When retrieving images, users may find it easier to express the desired semantic content with keywords than visual features. Accurate keyword retrieval can only occur when images are completely and accurately described. This can be achieved either through laborious manual effort or ...

Deep Similarity-Enhanced K Nearest Neighbors

The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning ta... more The k Nearest Neighbors (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we propose a novel deep learning architecture, which is called the Deep Similarity-Enhanced K Nearest Neighbors (DSE-KNN), to learn an optimized similarity function of the data directly towards the goal of optimizing the KNN decision making. In other words, the type of similarity function that is used in our method is not pre-determined but rather learned to map data to a high-dimensional feature space where the accuracy of the KNN decision making is maximized. Experimental results show that DSE-KNN outperforms other common machine learning methods on classifying different types of disease datasets and predicting daily price direction of different stock ETFs.

arXiv (Cornell University), Jun 2, 2017

Alzheimer's disease is a major cause of dementia. Its diagnosis requires accurate biomarkers that... more Alzheimer's disease is a major cause of dementia. Its diagnosis requires accurate biomarkers that are sensitive to disease stages. In this respect, we regard probabilistic classification as a method of designing a probabilistic biomarker for disease staging. Probabilistic biomarkers naturally support the interpretation of decisions and evaluation of uncertainty associated with them. In this paper, we obtain probabilistic biomarkers via Gaussian Processes. Gaussian Processes enable probabilistic kernel machines that offer flexible means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we propose a new variation of Automatic Relevance Determination and tackle the challenges of high dimensionality through multiple kernels. Our research results demonstrate that the Gaussian Process models are competitive with or better than the well-known Support Vector Machine in terms of classification performance even in the cases of single kernel learning. Extending the basic scheme towards the Multiple Kernel Learning, we improve the efficacy of the Gaussian Process models and their interpretability in terms of the known anatomical correlates of the disease. For instance, the disease pathology starts in and around the hippocampus and entorhinal cortex. Through the use of Gaussian Processes and Multiple Kernel Learning, we have automatically and efficiently determined those portions of neuroimaging data. In addition to their interpretability, our Gaussian Process models are competitive with recent deep learning solutions under similar settings.

IEEE Access, 2021

Attacks launched over the Internet often degrade or disrupt the quality of online services. Vario... more Attacks launched over the Internet often degrade or disrupt the quality of online services. Various Intrusion Detection Systems (IDSs), with or without prevention capabilities, have been proposed to defend networks or hosts against such attacks. While most of these IDSs extract features from the packet headers to detect any irregularities in the network traffic, some others use payloads alongside the headers. In this study, we propose a payload-based intrusion detection scheme, PayloadEmbeddings, using byte embeddings of the payloads of network packets. We employ a shallow neural network to generate vector representations for bytes and their corresponding payloads. Our feature extraction technique is coupled with the k-Nearest Neighbours (kNN) algorithm for the classification of packets as intrusive or non-intrusive. In our experiments, we evaluated 34 publicly available datasets, and used ten distinct payload-based, labeled intrusion detection datasets to train and evaluate our approach. Our empirical results show that PayloadEmbeddings reaches between 75% and 99% accuracy across all datasets. Finally, we compare our approach to other state-of-the-art and traditional intrusion detection techniques. Our findings suggest that PayloadEmbeddings demonstrates significant advantages over the other techniques on most of the datasets. INDEX TERMS Intrusion detection, payload embeddings, byte embeddings.

Online mining for association rules and collective anomalies in data streams

2017 IEEE International Conference on Big Data (Big Data), 2017

When analyzing streaming data, the results can depreciate in value faster than the analysis can b... more When analyzing streaming data, the results can depreciate in value faster than the analysis can be completed and results deployed. This is certainly the case in the area of anomaly detection, where detecting a potential problem as it is occurring (or in the early stages) can permit corrective behavior. However, most anomaly detection methods focus on point anomalies, whilst many fraudulent behaviors could be detected only through collective analysis of sequences of data in practice. Moreover, anomaly detection systems often stop at detecting anomalies; they typically do not provide information about how the features (attributes) of anomalies relate to each other or to those in normal states. The goal of this research is to create a distributed system that allows for the detection of collective anomalies from streaming data, and to provide a richer context of information about the anomalies besides their presence. To accomplish this, we (a) re-engineered an online sequence anomaly detection algorithm and (b) designed new algorithms for targeted association mining to run on a streaming, distributed environment. Our experiments, conducted on both synthetic and real-world data sets, demonstrated that the proposed framework is able to achieve near real-time response in detecting anomalies and extracting information pertaining to the anomalies.

Impact of data mining on database security

Digitized resources are growing at a rapid pace. One of the challenges facing the computer scienc... more Digitized resources are growing at a rapid pace. One of the challenges facing the computer science community is the development of techniques and tools to discover new and useful information from large collections of data. There are a number of basic issues associated with this challenge and many are still unresolved. This situation has led to the emergence of a new area of study called "Knowledge Discovery in Databases" (KDD). KDD is comprised of researchers from a variety of fields, including statistics, pattern recognition, artificial intelligence, machine learning and databases. Recent efforts of KDD researchers have focused primarily on issues surrounding the individual steps of the discovery process. Those issues not directly related to the discovery process have received much less attention. One such issue is the impact of this new technology on database security. In particular, the security threat presented by classification learning methods. Providing safeguards a...

Itemset Trees for Targeted

Proceedings of the 52nd Hawaii International Conference on System Sciences, 2019

Personalized top-N recommendation algorithms are among the most effective techniques providing cu... more Personalized top-N recommendation algorithms are among the most effective techniques providing customized suggestions in information retrieval applications. Most of the current methods construct personalized recommendations based on various loss functions such as pairwise ranking loss and point-wise recovery loss. In this paper, we propose a personalized top-N recommendation method based on non-negative matrix factorization with divergence as a point-wise ranking loss function. Our method finds the latent factors from the existing data to improve recommendation predictions. We formulate the learning problem with regularized divergence as a constrained non-convex minimization problem and develop a projected gradient descent optimization algorithm to solve the divergence problem. We evaluate our approach using six personal recommendation task related datasets by employing root mean squared error (RMSE) and hit rate (HR). Our experimental results demonstrate improved RMSE and HR for most of the datasets.

Detection of event onset using Twitter

2016 International Joint Conference on Neural Networks (IJCNN), 2016

Social Media generates information about news and events in real-time. Given the vast amount of d... more Social Media generates information about news and events in real-time. Given the vast amount of data available and the rate of information propagation, reliably identifying events is a challenge. Most state-of-the-art techniques are post hoc techniques that detect an event after it happened. Our goal is to detect onset of an event as it is happening using the user-generated information from Twitter streams. To achieve this goal, we use a discriminative model to identify change in the pattern of conversations over time. We use a topic evolution model to find credible events and eliminate random noise that is prevalent in many of the event detection models. The simplicity of the proposed model allows detect events quickly and efficiently, permitting discovery of events within minutes from the start of conversation about those conversations on Twitter. Our model is evaluated on a large-scale Twitter corpus to detect events in real-time. The proposed model is tested on other datasets to detect change over longer periods of time. The results indicate we can detect real events, within 3 to 8 minutes of it first appearing, with a lower degree of noise compared to other methods.