Incremental Methods for Detecting Outliers from Multivariate Data Stream (original) (raw)
Related papers
Outlier Detection Methods and the Challenges for their Implementation with Streaming Data
Journal of Mobile Multimedia
Outlier detection has been a generally examined issue and highly used in a varied range of spaces. For example, transaction fraud, certain rise and fall in share market, sudden changes in weather, interruption detection for digital security, and fraud detection in security design patterns in data. Data mining is the rule of dealing with big amounts of data and choosing the important. Outlier detection is data mining procedures that identify uncommon occasions and special cases. This paper discusses fundamental concepts of outlier detection, the outlier types and the challenges in their detection. An in-depth presentation of outlier detection techniques is given which are divided into three major categories: supervised, semi supervised, and unsupervised. Special attention is given to unsupervised outlier detection. The existing algorithms and techniques in this category are elaborated in detail and the advantages and shortcomings of these techniques are summarized. The analyses of th...
Continuous outlier detection in data streams
Proceedings of the 2013 international conference on Management of data - SIGMOD '13, 2013
Anomaly detection is an important data mining task, aiming at the discovery of elements that show significant diversion from the expected behavior; such elements are termed as outliers. One of the most widely employed criteria for determining whether an element is an outlier is based on the number of neighboring elements within a fixed distance (R), against a fixed threshold (k). Such outliers are referred to as distance-based outliers and are the focus of this work. In this demo, we show both an extendible framework for outlier detection algorithms and specific outlier detection algorithms for the demanding case where outlier detection is continuously performed over a data stream. More specifically: i) first we demonstrate a novel flavor of an open-source publicly available tool for Massive Online Analysis (MOA) that is endowed with capabilities to encapsulate algorithms that continuously detect outliers and ii) second, we present four online outlier detection algorithms. Two of these algorithms have been designed by the authors of this demo, with a view to improving on key aspects related to outlier mining, such as running time, flexibility and space requirements.
Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incrementa Approach
Sensors
To design an algorithm for detecting outliers over streaming data has become an important task in many common applications, arising in areas such as fraud detections, network analysis, environment monitoring and so forth. Due to the fact that real-time data may arrive in the form of streams rather than batches, properties such as concept drift, temporal context, transiency, and uncertainty need to be considered. In addition, data processing needs to be incremental with limited memory resource, and scalable. These facts create big challenges for existing outlier detection algorithms in terms of their accuracies when they are implemented in an incremental fashion, especially in the streaming environment. To address these problems, we first propose C_KDE_WR, which uses sliding window and kernel function to process the streaming data online, and reports its results demonstrating high throughput on handling real-time streaming data, implemented in a CUDA framework on Graphics Processing ...
Parallel outlier detection in real time data streams
Information Sciences Letters, 2020
Outlier detection is one of the major problems in modern applications. Specially, detecting outliers for streaming applications, as data can dynamically change in subtle ways following changes in the underlying infrastructure. Due to the evolution in data in ratio of data generated every second and velocity, detecting outliers in these types of data becomes a very challenging task. This makes processing the whole data one time is impossible. In this paper we propose a parallel window based local outlier detection (PWLOD) algorithm that can detect outliers in real time using the sliding window algorithm and partition each window among several processing nodes. Each processing node process its portion of window using Local Outlier Factor algorithm and send the results to the master node which collects the results and process them to select the outliers. The experimental results show that the proposed algorithm has better execution time and accuracy than the state-of-the-art algorithms.
A New Approach for Detecting Outliers in Data Streams.
International Journal of Engineering Sciences & Research Technology, 2013
In modern years, data streams have become an increasingly important research area, where as data stream refers to continuous flow of data and it is a process of extracting knowledge structure from continuous, rapid data records and it can be considered as a subfield of data mining. Data Stream can be classified into two types they are offline and online streams. Online data stream used in an amount of real world appliances, including network traffic monitoring, intrusion detection, credit card and fraud detection and offline data stream are used in reports based on web log streams. Data size is extremely huge and potentially infinite and it's not possible to store all the data, so it leads to a mining challenge where shortage of limitations occurs in hardware and software. Data mining techniques are newly proposed for data streams they are highly helpful to mine the data are data stream clustering, data stream classification, frequent pattern technique, sliding window techniques and so on. For outlier detection data stream clustering technique is highly desirable one. The main objective of this research work is to perform the clustering process in data streams and detecting the outliers in data streams. Two types of clustering algorithms namely FUZZY C-MEANS and CLARANS are used for finding the outliers in data streams. The two performance factors such as clustering accuracy and outlier detection accuracy are used for analysis. By analyzing the experimental results, it is observed that the CLARANS clustering algorithm performance is more accurate than the FUZZY C-MEANS.
A Fast and Efficient Algorithm for Outlier Detection Over Data Streams
International Journal of Advanced Computer Science and Applications
Outlier detection over data streams is an important task in data mining. It has various applications such as fraud detection, public health, and computer network security. Many approaches have been proposed for outlier detection over data streams such as distance-,clustering-, density-, and learning-based approaches. In this paper, we are interested in the densitybased outlier detection over data streams. Specifically, we propose an improvement of DILOF, a recent density-based algorithm. We observed that the main disadvantage of DILOF is that its summarization method has many drawbacks such as it takes a lot of time and the algorithm accuracy is significant degradation. Our new algorithm is called DILOF C that utilizing an efficient summarization method. Our performance study shows that DILOF C outperforms DILOF in terms of total response time and outlier detection accuracy.
A Survey on Outlier Detection Techniques in Dynamic Data Stream
International Journal of Latest Engineering and Management Research (IJLEMR), 2017
Outlier detection has significant importance in the data mining domain. Applications which contain streaming data flow may have many abnormal or outlier data and these applications require efficient outlier detection techniques to detect and analyze these abnormal patterns. Outlier detection is the process of detecting patterns in the data which do not adhere to the normal behavior or data. These patterns are known by several terms such as anomalies, outliers, noise or inconsistent data. Detecting and analyzing the abnormal data like outliers is a wide research area with tremendous applications. Finding and selecting appropriate detection technique is mandatory. This survey presents the tools and techniques used for detecting outliers in data streams and attempts to classify the problem in outlier detection methods over the data stream. The review of detection techniques gives an insight into the further research opportunities in this domain.
Survey on Outlier Detection in Data Stream
International Journal of Computer Applications, 2016
Data mining provides a way for finding hidden and useful knowledge from the large amount of data .usually we find any information by finding normal trends or distribution of data .But sometimes rare event or data object may provide information which is very interesting to us .Outlier detection is one of the task of data mining .It finds abnormal data point or sequence hidden in the dataset .Data stream is unbounded sequence of data with explicit or implicit temporal context .Data stream is uncertain and dynamic in nature. Traditional outlier detection techniques for static data which require whole dataset for modelling is not suitable for data stream because whole data stream cannot be stored. Network intrusion detection ,web click stream analysis ,fraud detection ,fault detection in machines ,sensor data analysis are some of the applications of data stream outlier detection .In this paper, we have described several issues in data stream outlier detection and usual approaches or techniques for finding outlier in data stream .
Detection of Local Outlier over Dynamic Data Streams Using Efficient Partitioning Method
2009 WRI World Congress on Computer Science and Information Engineering, 2009
Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Some of the important applications in the field of data mining are fraud detection, customer behavior analysis, and intrusion detection. There are number of good research algorithms for detecting outliers if the entire data is available and algorithms can operate in more than single passes to achieve the required results. Among the existing methods, LOF (Local outlier Factor) a density based method is very efficient in detecting all forms of outliers. LOF algorithm can not be directly applied to the datastream as the large number of nearest neighbor searches, LOF computation and lrd (local reachability distances) can make it highly inefficient for datastream. In this paper we propose a cluster based partitioning algorithm which can divide the stream in safe region and candidate regions. In Second phase apply LOF algorithm over these partitions separately with some slight enhancement for LOF computation over candidate region to achieve accurate results for finding most outstanding outliers. Several experiments on different dataset confirm that our technique can find better outliers with low computational cost than the direct LOF or compared to the other enhancements proposed for LOF.
A new approach for outlier detection in near real time
2010
Outlier detection methods have been suggested for a broad range of applications. They are obligatory in different monitoring tasks (e.g. mobile phone monitoring, credit card usage monitoring). Aim is the detection of sudden changes in the usage pattern which may indicate deviations from normal usage. Parametric (statistical) and non parametric methods combined with univariate and multivariate methods form most of the body of research in anomaly detection. In this paper a novel approach for outlier detection is proposed. For the validation of the new approach we have applied the method on voice traffic time-series obtained from a real-life mobile network.