Kumar Ramaiyer - Academia.edu (original) (raw)
Papers by Kumar Ramaiyer
… and Statistics, 2009 …, May 17, 2009
Association analysis is one of the most popular analysis paradigms in data mining. In this paper,... more Association analysis is one of the most popular analysis paradigms in data mining. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for pre-processing protein interaction networks. Finally, we discuss some of the challenges that need to be addressed to make association analysis-based techniques more applicable for bioinformatics.
Understanding extreme events, such as hurricanes or forest fires, is of paramount importance beca... more Understanding extreme events, such as hurricanes or forest fires, is of paramount importance because of their adverse impacts on human beings. Such events often propagate in space and time. Predicting-even a few days in advance-what locations will get affected by the event tracks could benefit our society in many ways. Arguably, simulations from first principles, where underlying physics-based models are described by a system of equations, provide least reliable predictions for variables characterizing the dynamics of these extreme events. Data-driven model building has been recently emerging as a complementary approach that could learn the relationships between historically observed or simulated multiple, spatio-temporal ancillary variables and the dynamic behavior of extreme events of interest. While promising, the methodology for predictive learning from such complex data is still in its infancy. In this paper, we propose a dynamic networks-based methodology for in-advance prediction of the dynamic tracks of emerging extreme events. By associating a network model of the system with the known tracks, our method is capable of learning the recurrent network motifs that could be used as discriminatory signatures for the event's behavioral class. When applied to classifying the behavior of the hurricane tracks at their early formation stages in Western Africa region, our method is able to predict whether hurricane tracks will hit the land of the North Atlantic region at least 10-15 days lead lag time in advance with more than 90% accuracy using 10-fold cross-validation. To the best of our knowledge, no comparable methodology exists for solving this problem using data-driven models.
2012 IEEE 12th International Conference on Data Mining Workshops, 2012
A dynamic physical system often undergoes phase transitions in response to fluctuations induced o... more A dynamic physical system often undergoes phase transitions in response to fluctuations induced on system parameters. For example, hurricane activity is the climate system's response initiated by a liquid-vapor phase transition associated with non-linearly coupled fluctuations in the ocean and the atmosphere. Because our quantitative knowledge about highly non-linear dynamic systems is very meager, scientists often resort to linear regression techniques such as Least Absolute Deviation (LAD) to learn the non-linear system's response (e.g., hurricane activity) from observed or simulated system's parameters (e.g., temperature, precipitable water, pressure). While insightful, such models still offer limited predictability, and alternatives intended to capture non-linear behaviors such as Stepwise Regression are often controversial in nature. In this paper, we hypothesize that one of the primary reasons for lack of predictability is the treatment of an inherently multi-phase system as being phaseless. To bridge this gap, we propose a hybrid approach that first predicts the phase the system is in, and then estimates the magnitude of the system's response using the regression model optimized for this phase. Our approach is designed for systems that could be characterized by multi-variate spatio-temporal data from observations, simulations, or both.
2012 IEEE 12th International Conference on Data Mining Workshops, 2012
A dynamic physical system often undergoes phase transitions in response to fluctuations induced o... more A dynamic physical system often undergoes phase transitions in response to fluctuations induced on system parameters. For example, hurricane activity is the climate system's response initiated by a liquid-vapor phase transition associated with non-linearly coupled fluctuations in the ocean and the atmosphere. Because our quantitative knowledge about highly non-linear dynamic systems is very meager, scientists often resort to linear regression techniques such as Least Absolute Deviation (LAD) to learn the non-linear system's response (e.g., hurricane activity) from observed or simulated system's parameters (e.g., temperature, precipitable water, pressure). While insightful, such models still offer limited predictability, and alternatives intended to capture non-linear behaviors such as Stepwise Regression are often controversial in nature. In this paper, we hypothesize that one of the primary reasons for lack of predictability is the treatment of an inherently multi-phase system as being phaseless. To bridge this gap, we propose a hybrid approach that first predicts the phase the system is in, and then estimates the magnitude of the system's response using the regression model optimized for this phase. Our approach is designed for systems that could be characterized by multi-variate spatio-temporal data from observations, simulations, or both.
Lecture Notes in Computer Science, 2007
We describe our project that marries data mining together with Grid computing. Specifically, we f... more We describe our project that marries data mining together with Grid computing. Specifically, we focus on one data mining application -the Minnesota Intrusion Detection System (MINDS), which uses a suite of data mining based algorithms to address different aspects of cyber security including malicious activities such as denial-of-service (DoS) traffic, worms, policy violations and inside abuse. MINDS has shown great operational success in detecting network intrusions in several real deployments. In sophisticated distributed cyber attacks using a multitude of wide-area nodes, combining the results of several MINDS instances can enable additional early-alert cyber security. We also describe a Grid service system that can deploy and manage multiple MINDS instances across a wide-area network.
Parallel Computing, 1998
The problem of simulating the motion of a set of bodies arises in a variety of domains such as as... more The problem of simulating the motion of a set of bodies arises in a variety of domains such as astrophysics, molecular dynamics, fluid dynamics, and high energy physics. The all-to-all nature of interaction between various bodies renders this problem extremely computation-intensive. Techniques based on hierarchical approximations have effectively reduced the complexity of this problem. Coupled with parallel processing, these techniques hold the promise of large scale n-body simulations. In this paper, we present a spectrum of parallel formulations that are suited for different particle distributions. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation that combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. We generalize these schemes to dynamic domain decomposition coupled with a subtree assignment that tries to optimize locality of processor subdomains. Unlike existing schemes that are based on shipping data to processors needing them, our schemes are based on shipping computation to processors where data reside. We present an experimental evaluation of our schemes on a 256 processor nCUBE2 and a 256 processor CM5. The evaluation is based on an astrophysical simulation of a variety of Gaussian and Plummer distributions of varying irregularity. We study the impact of a variety of parameters such as the impact of multipole degree and the a-criterion on accuracy and parallel performance. We demonstrate that our parallel formulations yield excellent performance and scale up to a large number of processors, making it possible to run realistic simulations with millions of particles. Furthermore, we show that as the accuracy of simulations is increased by ) Corresponding author. 0167-8191r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.
… and Statistics, 2009 …, May 17, 2009
Association analysis is one of the most popular analysis paradigms in data mining. In this paper,... more Association analysis is one of the most popular analysis paradigms in data mining. In this paper, we present different types of association patterns and discuss some of their applications in bioinformatics. We present a case study showing the usefulness of association analysis-based techniques for pre-processing protein interaction networks. Finally, we discuss some of the challenges that need to be addressed to make association analysis-based techniques more applicable for bioinformatics.
Understanding extreme events, such as hurricanes or forest fires, is of paramount importance beca... more Understanding extreme events, such as hurricanes or forest fires, is of paramount importance because of their adverse impacts on human beings. Such events often propagate in space and time. Predicting-even a few days in advance-what locations will get affected by the event tracks could benefit our society in many ways. Arguably, simulations from first principles, where underlying physics-based models are described by a system of equations, provide least reliable predictions for variables characterizing the dynamics of these extreme events. Data-driven model building has been recently emerging as a complementary approach that could learn the relationships between historically observed or simulated multiple, spatio-temporal ancillary variables and the dynamic behavior of extreme events of interest. While promising, the methodology for predictive learning from such complex data is still in its infancy. In this paper, we propose a dynamic networks-based methodology for in-advance prediction of the dynamic tracks of emerging extreme events. By associating a network model of the system with the known tracks, our method is capable of learning the recurrent network motifs that could be used as discriminatory signatures for the event's behavioral class. When applied to classifying the behavior of the hurricane tracks at their early formation stages in Western Africa region, our method is able to predict whether hurricane tracks will hit the land of the North Atlantic region at least 10-15 days lead lag time in advance with more than 90% accuracy using 10-fold cross-validation. To the best of our knowledge, no comparable methodology exists for solving this problem using data-driven models.
2012 IEEE 12th International Conference on Data Mining Workshops, 2012
A dynamic physical system often undergoes phase transitions in response to fluctuations induced o... more A dynamic physical system often undergoes phase transitions in response to fluctuations induced on system parameters. For example, hurricane activity is the climate system's response initiated by a liquid-vapor phase transition associated with non-linearly coupled fluctuations in the ocean and the atmosphere. Because our quantitative knowledge about highly non-linear dynamic systems is very meager, scientists often resort to linear regression techniques such as Least Absolute Deviation (LAD) to learn the non-linear system's response (e.g., hurricane activity) from observed or simulated system's parameters (e.g., temperature, precipitable water, pressure). While insightful, such models still offer limited predictability, and alternatives intended to capture non-linear behaviors such as Stepwise Regression are often controversial in nature. In this paper, we hypothesize that one of the primary reasons for lack of predictability is the treatment of an inherently multi-phase system as being phaseless. To bridge this gap, we propose a hybrid approach that first predicts the phase the system is in, and then estimates the magnitude of the system's response using the regression model optimized for this phase. Our approach is designed for systems that could be characterized by multi-variate spatio-temporal data from observations, simulations, or both.
2012 IEEE 12th International Conference on Data Mining Workshops, 2012
A dynamic physical system often undergoes phase transitions in response to fluctuations induced o... more A dynamic physical system often undergoes phase transitions in response to fluctuations induced on system parameters. For example, hurricane activity is the climate system's response initiated by a liquid-vapor phase transition associated with non-linearly coupled fluctuations in the ocean and the atmosphere. Because our quantitative knowledge about highly non-linear dynamic systems is very meager, scientists often resort to linear regression techniques such as Least Absolute Deviation (LAD) to learn the non-linear system's response (e.g., hurricane activity) from observed or simulated system's parameters (e.g., temperature, precipitable water, pressure). While insightful, such models still offer limited predictability, and alternatives intended to capture non-linear behaviors such as Stepwise Regression are often controversial in nature. In this paper, we hypothesize that one of the primary reasons for lack of predictability is the treatment of an inherently multi-phase system as being phaseless. To bridge this gap, we propose a hybrid approach that first predicts the phase the system is in, and then estimates the magnitude of the system's response using the regression model optimized for this phase. Our approach is designed for systems that could be characterized by multi-variate spatio-temporal data from observations, simulations, or both.
Lecture Notes in Computer Science, 2007
We describe our project that marries data mining together with Grid computing. Specifically, we f... more We describe our project that marries data mining together with Grid computing. Specifically, we focus on one data mining application -the Minnesota Intrusion Detection System (MINDS), which uses a suite of data mining based algorithms to address different aspects of cyber security including malicious activities such as denial-of-service (DoS) traffic, worms, policy violations and inside abuse. MINDS has shown great operational success in detecting network intrusions in several real deployments. In sophisticated distributed cyber attacks using a multitude of wide-area nodes, combining the results of several MINDS instances can enable additional early-alert cyber security. We also describe a Grid service system that can deploy and manage multiple MINDS instances across a wide-area network.
Parallel Computing, 1998
The problem of simulating the motion of a set of bodies arises in a variety of domains such as as... more The problem of simulating the motion of a set of bodies arises in a variety of domains such as astrophysics, molecular dynamics, fluid dynamics, and high energy physics. The all-to-all nature of interaction between various bodies renders this problem extremely computation-intensive. Techniques based on hierarchical approximations have effectively reduced the complexity of this problem. Coupled with parallel processing, these techniques hold the promise of large scale n-body simulations. In this paper, we present a spectrum of parallel formulations that are suited for different particle distributions. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation that combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. We generalize these schemes to dynamic domain decomposition coupled with a subtree assignment that tries to optimize locality of processor subdomains. Unlike existing schemes that are based on shipping data to processors needing them, our schemes are based on shipping computation to processors where data reside. We present an experimental evaluation of our schemes on a 256 processor nCUBE2 and a 256 processor CM5. The evaluation is based on an astrophysical simulation of a variety of Gaussian and Plummer distributions of varying irregularity. We study the impact of a variety of parameters such as the impact of multipole degree and the a-criterion on accuracy and parallel performance. We demonstrate that our parallel formulations yield excellent performance and scale up to a large number of processors, making it possible to run realistic simulations with millions of particles. Furthermore, we show that as the accuracy of simulations is increased by ) Corresponding author. 0167-8191r98r$19.00 q 1998 Elsevier Science B.V. All rights reserved.