Ernst Gran - Academia.edu (original) (raw)

Papers by Ernst Gran

Research paper thumbnail of Fast hybrid network reconfiguration for large-scale lossless interconnection networks

2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), 2016

Reconfiguration of high performance lossless interconnection networks is a cumbersome and time-co... more Reconfiguration of high performance lossless interconnection networks is a cumbersome and time-consuming task. For that reason reconfiguration in large networks are typically limited to situations where it is absolutely necessary, for instance when severe faults occur. On the contrary, due to the shared and dynamic nature of modern cloud infrastructures, performance-driven reconfigurations are necessary to ensure efficient utilization of resources. In this work we present a scheme that allows for fast reconfigurations by limiting the task to subparts of the network that can benefit from a local reconfiguration. Moreover, our method is able to use different routing algorithms for different sub-parts within the same subnet. We also present a Fat-Tree routing algorithm that reconfigures a network given a user-provided node ordering. Hardware experiments and large scale simulation results show that we are able to significantly reduce reconfiguration times from 50% to as much as 98.7% for very large topologies, while improving performance.

Research paper thumbnail of Adaptive Routing in InfiniBand Hardware

2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Research paper thumbnail of Mobile Edge as Part of the Multi-Cloud Ecosystem: A Performance Study

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2019

Cloud computing has revolutionised the development and deployment of applications by running them... more Cloud computing has revolutionised the development and deployment of applications by running them cost-effectively in remote data centres. With the increasing need for mobility and micro-services, particularly with the emerging 5G mobile broadband networks, there is also a strong demand for mobile edge computing (MEC). It enables applications to run in small cloud systems in close proximity to the user in order to minimise latencies. Both cloud computing and MEC have their own advantages and disadvantages. Combining these two computing paradigms in a unified multi-cloud platform has the potential of obtaining the best of both worlds. However, a comprehensive study is needed to evaluate the performance gains and the overheads imposed by this combination to real-world cloud applications. In this paper, we introduce a baseline performance evaluation in order to identify the fallacies and pitfalls of combining multiple cloud systems and MEC into a unified MEC-multi-cloud platform. For this purpose, we analyze the basic, application-independent performance metrics of average roundtrip time (RTT) and average application payload throughput in a setup consisting of two private and one public cloud systems. This baseline performance analysis confirms the feasibility of MEC-multi-cloud and provides guidelines for designing an autonomic resource provisioning solution in terms of an extension proposed to our existing MELODIC middleware platform for multi-cloud applications.

Research paper thumbnail of Combinando diferentes enfoques para el control de congestion en redes de interconexion de altas prestaciones

Research paper thumbnail of Combining Congested-Flow Isolation and Injection Throttling in HPC Interconnection Networks

2011 International Conference on Parallel Processing, 2011

Existing congestion control mechanisms in interconnects can be divided into two general approache... more Existing congestion control mechanisms in interconnects can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. These two approaches have different, but non-overlapping weaknesses. In this paper we present in detail a method that combines injection throttling and congested-flow isolation. Through simulation studies we first demonstrate the respective flaws of the injection throttling and of flow isolation. Thereafter we show that our combined method extracts the best of both approaches in the sense that it gives fast reaction to congestion, it is scalable and it has good fairness properties with respect to the congested flows.

Research paper thumbnail of Efficient and Cost-Effective Hybrid Congestion Control for HPC Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems, 2015

Interconnection networks are key components in high-performance computing (HPC) systems, their pe... more Interconnection networks are key components in high-performance computing (HPC) systems, their performance having a strong influence on the overall system one. However, at high load, congestion and its negative effects (e.g., Head-of-line blocking) threaten the performance of the network, and so the one of the entire system. Congestion control (CC) is crucial to ensure an efficient utilization of the interconnection network during congestion situations. As one major trend is to reduce the effective wiring in interconnection networks to reduce cost and power consumption, the network will operate very close to its capacity. Thus, congestion control becomes essential. Existing CC techniques can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. However, both approaches have different, but non-overlapping weaknesses: injection throttling techniques have a slow reaction against congestion, while isolating traffic in special resources may lead the system to run out of those resources. In this paper we propose EcoCC, a new Efficient and Cost-Effective CC technique, that combines injection throttling and congested-flow isolation to minimize their respective drawbacks and maximize overall system performance. This new strategy is suitable for current commercial switch architectures, where it could be implemented without requiring significant complexity. Experimental results, using simulations under synthetic and real trace-based traffic patterns, show that this technique improves by up to 55 percent over some of the most successful congestion control techniques.

Research paper thumbnail of First experiences with congestion control in InfiniBand hardware

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Research paper thumbnail of Exploring the Scope of the InfiniBand Congestion Control Mechanism

2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012

Research paper thumbnail of On the Relation between Congestion Control, Switch Arbitration and Fairness

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011

Research paper thumbnail of How Far Should We Look Back to Achieve Effective Real-Time Time-Series Anomaly Detection?

Anomaly detection is the process of identifying unexpected events or abnormalities in data, and i... more Anomaly detection is the process of identifying unexpected events or abnormalities in data, and it has been applied in many different areas such as system monitoring, fraud detection, healthcare, intrusion detection, etc. Providing real-time, lightweight, and proactive anomaly detection for time series with neither human intervention nor domain knowledge could be highly valuable since it reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous event occurs. To our knowledge, RePAD (Real-time Proactive Anomaly Detection algorithm) is a generic approach with all abovementioned features. To achieve real-time and lightweight detection, RePAD utilizes Long Short-Term Memory (LSTM) to detect whether or not each upcoming data point is anomalous based on short-term historical data points. However, it is unclear that how different amounts of historical data points affect the performance of RePAD. Therefore, in this paper, we investigate the impact of ...

Research paper thumbnail of A Fault-Tolerant Routing Strategy for KNS Topologies Based on Intermediate Nodes

Exascale computing systems are being built with thousands of nodes. The high number of components... more Exascale computing systems are being built with thousands of nodes. The high number of components of these systems significantly increases the probability of failure. A key component for them is the interconnection network. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid k-ary n-direct s-indirect (KNS) family that provides optimal performance and connectivity at a reduced hardware cost. This paper presents a fault-tolerant routing methodology for the KNS topology that degrades performance gracefully in presence of faults and tolerates a large number of faults without disabling any healthy computing node. In order to tolerate network failures, the methodology uses a simple mechanism. For any source-destination pair, if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) with the aim of circumventing faults. The evaluation results shows that the proposed methodology tolerates a large number of faults. For instance, it is able to tolerate more than 99.5% of fault combinations when there are ten faults in a 3-D network with 1,000 nodes using only one intermediate node and more than 99.98% if two intermediate nodes are used. Furthermore, the methodology offers a gracious performance degradation. As an example, performance degrades only by 1% for a 2-D network with 1,024 nodes and 1% faulty links.

Research paper thumbnail of En studie av flytkontroll og bufferstørrelser i Ethernet som tett koblet nettverk

Research paper thumbnail of NORNET CORE - A multi-homed research testbed q,qq

Testbed Multi-homing Routing Transport Applications abstract Over the last decade, the Internet h... more Testbed Multi-homing Routing Transport Applications abstract Over the last decade, the Internet has grown at a tremendous speed in both size and com- plexity. Nowadays, a large number of important services - for instance e-commerce, healthcare and many others - depend on the availability of the underlying network. Clearly, service interruptions due to network problems may have a severe impact. On the long way towards the Future Internet, the complexity will grow even further. There- fore, new ideas and concepts must be evaluated thoroughly, and particularly in realistic, real-world Internet scenarios, before they can be deployed for production networks. For this purpose, various testbeds - for instance PLANETLAB ,G PENI or G-LAB - have been estab- lished and are intensively used for research. However, all of these testbeds lack the support for so-called multi-homing. Multi-homing denotes the connection of a site to multiple Internet service providers, in order to achieve redundancy....

Research paper thumbnail of DistTune: Distributed Fine-Grained Adaptive Traffic Speed Prediction for Growing Transportation Networks

Transportation Research Record: Journal of the Transportation Research Board

Over the past decade, many approaches have been introduced for traffic speed prediction. However,... more Over the past decade, many approaches have been introduced for traffic speed prediction. However, providing fine-grained, accurate, time-efficient, and adaptive traffic speed prediction for a growing transportation network where the size of the network keeps increasing and new traffic detectors are constantly deployed has not been well studied. To address this issue, this paper presents DistTune based on long short-term memory (LSTM) and the Nelder-Mead method. When encountering an unprocessed detector, DistTune decides if it should customize an LSTM model for this detector by comparing the detector with other processed detectors in the normalized traffic speed patterns they have observed. If a similarity is found, DistTune directly shares an existing LSTM model with this detector to achieve time-efficient processing. Otherwise, DistTune customizes an LSTM model for the detector to achieve fine-grained prediction. To make DistTune even more time-efficient, DisTune performs on a clus...

Research paper thumbnail of ReRe: A Lightweight Real-Time Ready-to-Go Anomaly Detection Approach for Time Series

2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)

Anomaly detection is an active research topic in many different fields such as intrusion detectio... more Anomaly detection is an active research topic in many different fields such as intrusion detection, network monitoring, system health monitoring, IoT healthcare, etc. However, many existing anomaly detection approaches require either human intervention or domain knowledge, and may suffer from high computation complexity, consequently hindering their applicability in real-world scenarios. Therefore, a lightweight and ready-to-go approach that is able to detect anomalies in real-time is highly sought-after. Such an approach could be easily and immediately applied to perform time series anomaly detection on any commodity machine. The approach could provide timely anomaly alerts and by that enable appropriate countermeasures to be undertaken as early as possible. With these goals in mind, this paper introduces ReRe, which is a Real-time Ready-to-go proactive Anomaly Detection algorithm for streaming time series. ReRe employs two lightweight Long Short-Term Memory (LSTM) models to predict and jointly determine whether or not an upcoming data point is anomalous based on short-term historical data points and two long-term self-adaptive thresholds. Experiments based on real-world time-series datasets demonstrate the good performance of ReRe in real-time anomaly detection without requiring human intervention or domain knowledge.

Research paper thumbnail of RePAD: Real-Time Proactive Anomaly Detection for Time Series

Advanced Information Networking and Applications

During the past decade, many anomaly detection approaches have been introduced in different field... more During the past decade, many anomaly detection approaches have been introduced in different fields such as network monitoring, fraud detection, and intrusion detection. However, they require understanding of data pattern and often need a long off-line period to build a model or network for the target data. Providing real-time and proactive anomaly detection for streaming time series without human intervention and domain knowledge is highly valuable since it greatly reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous damage, failure, or other harmful event occurs. However, this issue has not been well studied yet. To address it, this paper proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes short-term historical data points to predict and determine whether or not the upcoming data point is a sign that an anomaly is likely to happen in the near future. By dynamically adjusting the detection threshold over time, RePAD is able to tolerate minor pattern change in time series and detect anomalies either proactively or on time. Experiments based on two time series datasets collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to proactively detect anomalies and provide early warnings in real time without human intervention and domain knowledge.

Research paper thumbnail of SALAD: Self-Adaptive Lightweight Anomaly Detection for Real-time Recurrent Time Series

2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)

Research paper thumbnail of Distributed Fine-Grained Traffic Speed Prediction for Large-Scale Transportation Networks Based on Automatic LSTM Customization and Sharing

Euro-Par 2020: Parallel Processing

Research paper thumbnail of Improvements to the InfiniBand Congestion Control Mechanism

2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)

Research paper thumbnail of A Self-Adaptive Network for HPC Clouds: Architecture, Framework, and Implementation

IEEE Transactions on Parallel and Distributed Systems

Research paper thumbnail of Fast hybrid network reconfiguration for large-scale lossless interconnection networks

2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), 2016

Reconfiguration of high performance lossless interconnection networks is a cumbersome and time-co... more Reconfiguration of high performance lossless interconnection networks is a cumbersome and time-consuming task. For that reason reconfiguration in large networks are typically limited to situations where it is absolutely necessary, for instance when severe faults occur. On the contrary, due to the shared and dynamic nature of modern cloud infrastructures, performance-driven reconfigurations are necessary to ensure efficient utilization of resources. In this work we present a scheme that allows for fast reconfigurations by limiting the task to subparts of the network that can benefit from a local reconfiguration. Moreover, our method is able to use different routing algorithms for different sub-parts within the same subnet. We also present a Fat-Tree routing algorithm that reconfigures a network given a user-provided node ordering. Hardware experiments and large scale simulation results show that we are able to significantly reduce reconfiguration times from 50% to as much as 98.7% for very large topologies, while improving performance.

Research paper thumbnail of Adaptive Routing in InfiniBand Hardware

2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Research paper thumbnail of Mobile Edge as Part of the Multi-Cloud Ecosystem: A Performance Study

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), 2019

Cloud computing has revolutionised the development and deployment of applications by running them... more Cloud computing has revolutionised the development and deployment of applications by running them cost-effectively in remote data centres. With the increasing need for mobility and micro-services, particularly with the emerging 5G mobile broadband networks, there is also a strong demand for mobile edge computing (MEC). It enables applications to run in small cloud systems in close proximity to the user in order to minimise latencies. Both cloud computing and MEC have their own advantages and disadvantages. Combining these two computing paradigms in a unified multi-cloud platform has the potential of obtaining the best of both worlds. However, a comprehensive study is needed to evaluate the performance gains and the overheads imposed by this combination to real-world cloud applications. In this paper, we introduce a baseline performance evaluation in order to identify the fallacies and pitfalls of combining multiple cloud systems and MEC into a unified MEC-multi-cloud platform. For this purpose, we analyze the basic, application-independent performance metrics of average roundtrip time (RTT) and average application payload throughput in a setup consisting of two private and one public cloud systems. This baseline performance analysis confirms the feasibility of MEC-multi-cloud and provides guidelines for designing an autonomic resource provisioning solution in terms of an extension proposed to our existing MELODIC middleware platform for multi-cloud applications.

Research paper thumbnail of Combinando diferentes enfoques para el control de congestion en redes de interconexion de altas prestaciones

Research paper thumbnail of Combining Congested-Flow Isolation and Injection Throttling in HPC Interconnection Networks

2011 International Conference on Parallel Processing, 2011

Existing congestion control mechanisms in interconnects can be divided into two general approache... more Existing congestion control mechanisms in interconnects can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. These two approaches have different, but non-overlapping weaknesses. In this paper we present in detail a method that combines injection throttling and congested-flow isolation. Through simulation studies we first demonstrate the respective flaws of the injection throttling and of flow isolation. Thereafter we show that our combined method extracts the best of both approaches in the sense that it gives fast reaction to congestion, it is scalable and it has good fairness properties with respect to the congested flows.

Research paper thumbnail of Efficient and Cost-Effective Hybrid Congestion Control for HPC Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems, 2015

Interconnection networks are key components in high-performance computing (HPC) systems, their pe... more Interconnection networks are key components in high-performance computing (HPC) systems, their performance having a strong influence on the overall system one. However, at high load, congestion and its negative effects (e.g., Head-of-line blocking) threaten the performance of the network, and so the one of the entire system. Congestion control (CC) is crucial to ensure an efficient utilization of the interconnection network during congestion situations. As one major trend is to reduce the effective wiring in interconnection networks to reduce cost and power consumption, the network will operate very close to its capacity. Thus, congestion control becomes essential. Existing CC techniques can be divided into two general approaches. One is to throttle traffic injection at the sources that contribute to congestion, and the other is to isolate the congested traffic in specially designated resources. However, both approaches have different, but non-overlapping weaknesses: injection throttling techniques have a slow reaction against congestion, while isolating traffic in special resources may lead the system to run out of those resources. In this paper we propose EcoCC, a new Efficient and Cost-Effective CC technique, that combines injection throttling and congested-flow isolation to minimize their respective drawbacks and maximize overall system performance. This new strategy is suitable for current commercial switch architectures, where it could be implemented without requiring significant complexity. Experimental results, using simulations under synthetic and real trace-based traffic patterns, show that this technique improves by up to 55 percent over some of the most successful congestion control techniques.

Research paper thumbnail of First experiences with congestion control in InfiniBand hardware

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), 2010

Research paper thumbnail of Exploring the Scope of the InfiniBand Congestion Control Mechanism

2012 IEEE 26th International Parallel and Distributed Processing Symposium, 2012

Research paper thumbnail of On the Relation between Congestion Control, Switch Arbitration and Fairness

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2011

Research paper thumbnail of How Far Should We Look Back to Achieve Effective Real-Time Time-Series Anomaly Detection?

Anomaly detection is the process of identifying unexpected events or abnormalities in data, and i... more Anomaly detection is the process of identifying unexpected events or abnormalities in data, and it has been applied in many different areas such as system monitoring, fraud detection, healthcare, intrusion detection, etc. Providing real-time, lightweight, and proactive anomaly detection for time series with neither human intervention nor domain knowledge could be highly valuable since it reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous event occurs. To our knowledge, RePAD (Real-time Proactive Anomaly Detection algorithm) is a generic approach with all abovementioned features. To achieve real-time and lightweight detection, RePAD utilizes Long Short-Term Memory (LSTM) to detect whether or not each upcoming data point is anomalous based on short-term historical data points. However, it is unclear that how different amounts of historical data points affect the performance of RePAD. Therefore, in this paper, we investigate the impact of ...

Research paper thumbnail of A Fault-Tolerant Routing Strategy for KNS Topologies Based on Intermediate Nodes

Exascale computing systems are being built with thousands of nodes. The high number of components... more Exascale computing systems are being built with thousands of nodes. The high number of components of these systems significantly increases the probability of failure. A key component for them is the interconnection network. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid k-ary n-direct s-indirect (KNS) family that provides optimal performance and connectivity at a reduced hardware cost. This paper presents a fault-tolerant routing methodology for the KNS topology that degrades performance gracefully in presence of faults and tolerates a large number of faults without disabling any healthy computing node. In order to tolerate network failures, the methodology uses a simple mechanism. For any source-destination pair, if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) with the aim of circumventing faults. The evaluation results shows that the proposed methodology tolerates a large number of faults. For instance, it is able to tolerate more than 99.5% of fault combinations when there are ten faults in a 3-D network with 1,000 nodes using only one intermediate node and more than 99.98% if two intermediate nodes are used. Furthermore, the methodology offers a gracious performance degradation. As an example, performance degrades only by 1% for a 2-D network with 1,024 nodes and 1% faulty links.

Research paper thumbnail of En studie av flytkontroll og bufferstørrelser i Ethernet som tett koblet nettverk

Research paper thumbnail of NORNET CORE - A multi-homed research testbed q,qq

Testbed Multi-homing Routing Transport Applications abstract Over the last decade, the Internet h... more Testbed Multi-homing Routing Transport Applications abstract Over the last decade, the Internet has grown at a tremendous speed in both size and com- plexity. Nowadays, a large number of important services - for instance e-commerce, healthcare and many others - depend on the availability of the underlying network. Clearly, service interruptions due to network problems may have a severe impact. On the long way towards the Future Internet, the complexity will grow even further. There- fore, new ideas and concepts must be evaluated thoroughly, and particularly in realistic, real-world Internet scenarios, before they can be deployed for production networks. For this purpose, various testbeds - for instance PLANETLAB ,G PENI or G-LAB - have been estab- lished and are intensively used for research. However, all of these testbeds lack the support for so-called multi-homing. Multi-homing denotes the connection of a site to multiple Internet service providers, in order to achieve redundancy....

Research paper thumbnail of DistTune: Distributed Fine-Grained Adaptive Traffic Speed Prediction for Growing Transportation Networks

Transportation Research Record: Journal of the Transportation Research Board

Over the past decade, many approaches have been introduced for traffic speed prediction. However,... more Over the past decade, many approaches have been introduced for traffic speed prediction. However, providing fine-grained, accurate, time-efficient, and adaptive traffic speed prediction for a growing transportation network where the size of the network keeps increasing and new traffic detectors are constantly deployed has not been well studied. To address this issue, this paper presents DistTune based on long short-term memory (LSTM) and the Nelder-Mead method. When encountering an unprocessed detector, DistTune decides if it should customize an LSTM model for this detector by comparing the detector with other processed detectors in the normalized traffic speed patterns they have observed. If a similarity is found, DistTune directly shares an existing LSTM model with this detector to achieve time-efficient processing. Otherwise, DistTune customizes an LSTM model for the detector to achieve fine-grained prediction. To make DistTune even more time-efficient, DisTune performs on a clus...

Research paper thumbnail of ReRe: A Lightweight Real-Time Ready-to-Go Anomaly Detection Approach for Time Series

2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC)

Anomaly detection is an active research topic in many different fields such as intrusion detectio... more Anomaly detection is an active research topic in many different fields such as intrusion detection, network monitoring, system health monitoring, IoT healthcare, etc. However, many existing anomaly detection approaches require either human intervention or domain knowledge, and may suffer from high computation complexity, consequently hindering their applicability in real-world scenarios. Therefore, a lightweight and ready-to-go approach that is able to detect anomalies in real-time is highly sought-after. Such an approach could be easily and immediately applied to perform time series anomaly detection on any commodity machine. The approach could provide timely anomaly alerts and by that enable appropriate countermeasures to be undertaken as early as possible. With these goals in mind, this paper introduces ReRe, which is a Real-time Ready-to-go proactive Anomaly Detection algorithm for streaming time series. ReRe employs two lightweight Long Short-Term Memory (LSTM) models to predict and jointly determine whether or not an upcoming data point is anomalous based on short-term historical data points and two long-term self-adaptive thresholds. Experiments based on real-world time-series datasets demonstrate the good performance of ReRe in real-time anomaly detection without requiring human intervention or domain knowledge.

Research paper thumbnail of RePAD: Real-Time Proactive Anomaly Detection for Time Series

Advanced Information Networking and Applications

During the past decade, many anomaly detection approaches have been introduced in different field... more During the past decade, many anomaly detection approaches have been introduced in different fields such as network monitoring, fraud detection, and intrusion detection. However, they require understanding of data pattern and often need a long off-line period to build a model or network for the target data. Providing real-time and proactive anomaly detection for streaming time series without human intervention and domain knowledge is highly valuable since it greatly reduces human effort and enables appropriate countermeasures to be undertaken before a disastrous damage, failure, or other harmful event occurs. However, this issue has not been well studied yet. To address it, this paper proposes RePAD, which is a Real-time Proactive Anomaly Detection algorithm for streaming time series based on Long Short-Term Memory (LSTM). RePAD utilizes short-term historical data points to predict and determine whether or not the upcoming data point is a sign that an anomaly is likely to happen in the near future. By dynamically adjusting the detection threshold over time, RePAD is able to tolerate minor pattern change in time series and detect anomalies either proactively or on time. Experiments based on two time series datasets collected from the Numenta Anomaly Benchmark demonstrate that RePAD is able to proactively detect anomalies and provide early warnings in real time without human intervention and domain knowledge.

Research paper thumbnail of SALAD: Self-Adaptive Lightweight Anomaly Detection for Real-time Recurrent Time Series

2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC)

Research paper thumbnail of Distributed Fine-Grained Traffic Speed Prediction for Large-Scale Transportation Networks Based on Automatic LSTM Customization and Sharing

Euro-Par 2020: Parallel Processing

Research paper thumbnail of Improvements to the InfiniBand Congestion Control Mechanism

2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI)

Research paper thumbnail of A Self-Adaptive Network for HPC Clouds: Architecture, Framework, and Implementation

IEEE Transactions on Parallel and Distributed Systems