Extra-adaptive robust online subspace tracker for anomaly detection from streaming networks (original) (raw)
Related papers
Streaming Anomaly Detection Using Randomized Matrix Sketching
Data is continuously being generated from sources such as machines, network traffic, application logs, etc. Timely and accurate detection of anomalies in massive data streams has important applications such as in preventing machine failures, intrusion detection, and dynamic load balancing. In this paper, we introduce a novel (unsupervised) anomaly detection framework which can be used to detect anomalies in a streaming fashion by making only one pass over the data while utilizing limited storage. We adapt ideas from matrix sketching to maintain, in a streaming model, a set of few orthogonal vectors that form a good approximate basis for all the observed data. Using this constructed orthogonal basis, anomalies in new incoming data are detected based on a simple reconstruction error test. We theoretically prove that our algorithm compares favorably with an offline approach based on expensive global singular value decomposition (SVD) updates. Additionally, we apply ideas from randomized low-rank matrix approximations to further speedup the algorithm. The experimental results show the effectiveness and efficiency of our approach over other popular scalable anomaly detection approaches.
Multivariate online anomaly detection using kernel recursive least squares
2007
High-speed backbones are regularly affected by various kinds of network anomalies, ranging from malicious attacks to harmless large data transfers. Different types of anomalies affect the network in different ways, and it is difficult to know a priori how a potential anomaly will exhibit itself in traffic statistics. In this paper we describe an online, sequential, anomaly detection algorithm, that is suitable for use with multivariate data. The proposed algorithm is based on the kernel version of the recursive least squares algorithm. It assumes no model for network traffic or anomalies, and constructs and adapts a dictionary of features that approximately spans the subspace of normal behaviour. The algorithm raises an alarm immediately upon encountering a deviation from the norm. Through comparison with existing block-based offline methods based upon Principal Component Analysis, we demonstrate that our online algorithm is equally effective but has much faster time-to-detection and lower computational complexity. We also explore minimum volume set approaches in identifying the region of normality.
No Free Lunch But A Cheaper Supper: A General Framework for Streaming Anomaly Detection
Expert Systems with Applications, 2020
In recent years, research interest in detecting anomalies in temporal streaming data has increased significantly. A variety of algorithms are being developed in the data mining community. They can be broadly divided into two categories, namely general-purpose and ad hoc ones. In most cases, general approaches assume a one-size-fits-all solution model, and strive to design a single "optimal" anomaly detector which can detect all anomalies in any domain. To date, there exists no universal method that has been shown to outperform the others across different anomaly types, use cases and datasets. In this paper, we propose SAFARI , a framework created by abstracting and unifying the fundamental tasks within the streaming anomaly detection. SAFARI provides a flexible and extensible anomaly detection procedure to overcome the limitations of one-size-fits-all solutions. Such abstraction helps to facilitate more elaborate algorithm comparisons by allowing us to isolate the effects of shared and unique characteristics of diverse algorithms on the performance. Using the framework, we have identified a research gap that motivated us to propose a novel learning strategy. We implemented twenty different anomaly detectors and conducted an extensive evaluation study, comparing their performances using real-world benchmark datasets with different properties. The results indicate that there is no single superior detector which works perfectly for every case, proving our hypothesis that "there is no free lunch" in the streaming anomaly detection world. Finally, we discuss the benefits and drawbacks of each method in-depth, drawing a set of conclusions and guidelines to guide future users of SAFARI.
Tensor-Based Online Network Anomaly Detection and Diagnosis
IEEE Access
This paper presents an online anomaly detection system capable of handling operational network traffic of large networks (such as an ISP). We also aim for an effective and practical diagnosis of anomalies diagnosis to produce actionable intelligence that enables automated response. To achieve these objectives, we use the following approaches. (1) We model the status of the network by a stream of tensors where each tensor cell contains a time series. (2) We detect anomalous tensors at discrete time steps using an unsupervised tensor representation learning model. (3) We produce actionable intelligence by diagnosing anomaly detection results and identifying the abnormal time series that are the most likely causes of each anomaly in the tensor. (4) We further analyze the traffic corresponding to each anomalous time series by an innovative method that extracts and isolates the attack traffic. (5) We provide solutions for streaming data anomaly detection challenges such as large volume, high velocity, seasonality, and concept drift. We apply our approach to the complete test set of UGR data to show its practicality and effectiveness. Not only can we detect and isolate most of the labelled attack traffic, but we also identify many organic attack activities in the UGR data. Our results on the complete UGR dataset show high detection and isolation rates for the labelled attacks in the dataset. We also report on additional organic attacks we detected that were originally labelled as background in the dataset. Our analysis shows that the isolated background traffic represents interesting and potentially malicious behavior and can provide invaluable insight for cyber-threat researchers. INDEX TERMS Anomaly detection, anomaly diagnosis, convolutional neural network, autoencoder, NetFlow.
Online Multivariate Anomaly Detection and Localization for High-dimensional Settings
ArXiv, 2019
This paper considers the real-time detection of anomalies in high-dimensional systems. The goal is to detect anomalies quickly and accurately so that the appropriate countermeasures could be taken in time, before the system possibly gets harmed. We propose a sequential and multivariate anomaly detection method that scales well to high-dimensional datasets. The proposed method follows a nonparametric, i.e., data-driven, and semi-supervised approach, i.e., trains only on nominal data. Thus, it is applicable to a wide range of applications and data types. Thanks to its multivariate nature, it can quickly and accurately detect challenging anomalies, such as changes in the correlation structure and stealth low-rate cyberattacks. Its asymptotic optimality and computational complexity are comprehensively analyzed. In conjunction with the detection method, an effective technique for localizing the anomalous data dimensions is also proposed. We further extend the proposed detection and local...
Detection and localization of change-points in high-dimensional network traffic data
The Annals of Applied Statistics, 2009
We propose a novel and efficient method, that we shall call TopRank in the following paper, for detecting change-points in high-dimensional data. This issue is of growing concern to the network security community since network anomalies such as Denial of Service (DoS) attacks lead to changes in Internet traffic. Our method consists of a data reduction stage based on record filtering, followed by a nonparametric change-point detection test based on U-statistics. Using this approach, we can address massive data streams and perform anomaly detection and localization on the fly. We show how it applies to some real Internet traffic provided by France-Télécom (a French Internet service provider) in the framework of the ANR-RNRT OSCAR project. This approach is very attractive since it benefits from a low computational load and is able to detect and localize several types of network anomalies. We also assess the performance of the TopRank algorithm using synthetic data and compare it with alternative approaches based on random aggregation.
Challenging the supremacy of traffic matrices in anomaly detection
2007
Multiple network-wide anomaly detection techniques proposed in the literature define an anomaly as a statistical outlier in aggregated network traffic. The most popular way to aggregate the traffic is as a Traffic Matrix, where the traffic is divided according to its ingress and egress points in the network. However, the reasons for choosing traffic matrices instead of any other formalism have not been studied yet. In this paper we compare three network-driven traffic aggregation formalisms: ingress routers, input links and origin-destination pairs (i.e. traffic matrices). Each formalism is computed on data collected from two research backbones. Then, a network-wide anomaly detection method is applied to each formalism. All anomalies are manually labeled, as a true or false positive. Our results show that the traffic aggregation level has a significant impact on the number of anomalies detected and on the false positive rate. We show that aggregating by OD pairs is indeed the most appropriate choice for the data sets and the detection method we consider. We correlate our observations with time series statistics in order to explain how aggregation impacts anomaly detection.
Anomaly detection in large evolving graphs
2017
ANOMALY detection plays a vital role in various application domains including network intrusion detection, environmental monitoring and road traffic analysis. However a major challenge in anomaly detection is how to mine datasets where objects possess causal/non-causal relationships such as friendship, citation and communication relationships. This type of relational data can be represented as a graph, and raises the challenges of how to extend anomaly detection to the domain of relational datasets such as graphs. Anomalies often provide valuable insights into the correlation between an abnormal pattern and a real world phenomenon. Therefore, there has been growing attention towards anomaly detection schemes in dynamic networks. Although these evolving networks impose a curse of dimensionality on the learning models, they usually contain structural properties that anomaly detection schemes can exploit. The major challenge is finding a feature extraction technique that preserves grap...
MemStream: Memory-Based Streaming Anomaly Detection
Proceedings of the ACM Web Conference 2022
Given a stream of entries over time in a multi-dimensional data setting where concept drift is present, how can we detect anomalous activities? Most of the existing unsupervised anomaly detection approaches seek to detect anomalous events in an offline fashion and require a large amount of data for training. This is not practical in real-life scenarios where we receive the data in a streaming manner and do not know the size of the stream beforehand. Thus, we need a data-efficient method that can detect and adapt to changing data trends, or concept drift, in an online manner. In this work, we propose MemStream, a streaming anomaly detection framework, allowing us to detect unusual events as they occur while being resilient to concept drift. We leverage the power of a denoising autoencoder to learn representations and a memory module to learn the dynamically changing trend in data without the need for labels. We prove the optimum memory size required for effective drift handling. Furthermore, MemStream makes use of two architecture design choices to be robust to memory poisoning. Experimental results show the effectiveness of our approach compared to stateof-the-art streaming baselines using 2 synthetic datasets and 11 real-world datasets. CCS CONCEPTS • Computing methodologies → Anomaly detection; Online learning settings.
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
Time series anomaly detection remains one of the most active areas of research in data mining. In spite of the dozens of creative solutions proposed for this problem, recent empirical evidence suggests that time series discords, a relatively simple twenty-year old distance-based technique, remains among the state-of-art techniques. While there are many algorithms for computing the time series discords, they all have limitations. First, they are limited to the batch case, whereas the online case is more actionable. Second, these algorithms exhibit poor scalability beyond tens of thousands of datapoints. In this work we introduce DAMP, a novel algorithm that addresses both these issues. DAMP computes exact left-discords on fast arriving streams, at up to 300,000 Hz using a commodity desktop. This allows us to find time series discords in datasets with trillions of datapoints for the first time. We will demonstrate the utility of our algorithm with the most ambitious set of time series anomaly detection experiments ever conducted. CCS CONCEPTS • Computing methodologies → Anomaly detection.