Faster and parameter-free discord search in quasi-periodic time series (original) (raw)

A Parallel Approach to Discords Discovery in Massive Time Series Data

Computers, Materials & Continua, 2021

A discord is a refinement of the concept of an anomalous subsequence of a time series. Being one of the topical issues of time series mining, discords discovery is applied in a wide range of real-world areas (medicine, astronomy, economics, climate modeling, predictive maintenance, energy consumption, etc.). In this article, we propose a novel parallel algorithm for discords discovery on high-performance cluster with nodes based on many-core accelerators in the case when time series cannot fit in the main memory. We assumed that the time series is partitioned across the cluster nodes and achieved parallelization among the cluster nodes as well as within a single node. Within a cluster node, the algorithm employs a set of matrix data structures to store and index the subsequences of a time series, and to provide an efficient vectorization of computations on the accelerator. At each node, the algorithm processes its own partition and performs in two phases, namely candidate selection and discord refinement, with each phase requiring one linear scan through the partition. Then the local discords found are combined into the global candidate set and transmitted to each cluster node. Next, a node performs refinement of the global candidate set over its own partition resulting in the local true discord set. Finally, the global true discords set is constructed as intersection of the local true discord sets. The experimental evaluation on the real computer cluster with real and synthetic time series shows a high scalability of the proposed algorithm.

Wat: Finding top-k discords in time series database

2007

Finding discords in time series database is an important problem in a great variety of applications, such as space shuttle telemetry, mechanical industry, biomedicine, and financial data analysis. However, most previous methods for this problem suffer from too many parameter settings which are difficult for users. The best known approach to our knowledge that has comparatively fewer parameters still requires users to choose a word size for the compression of subsequences. In this paper, we propose a Haar wavelet and augmented trie based algorithm to mine the top-K discords from a time series database, which can dynamically determine the word size for compression. Due to the characteristics of Haar wavelet transform, our algorithm has greater pruning power than previous approaches. Through experiments with some annotated datasets, the effectiveness and efficiency of our algorithm are both attested.

Finding the most unusual time series subsequence: algorithms and applications

Knowledge and Information Systems, 2006

In this work we introduce the new problem of finding time series discords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is three to four orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, space telemetry, respiration physiology, anthropological and video datasets.

From Cluster-Based Outlier Detection to Time Series Discord Discovery

Lecture Notes in Computer Science, 2015

Anomalous patterns or discords are just the kind of outliers in time series. In this paper, we present a new approach for time series discord discovery which is based on cluster-based outlier detection. In this approach, first, subsequence candidates are extracted from the time series using a segmentation method, then these candidates are transformed into the same length and are input for an appropriate clustering algorithm, and finally, we identify discords by using a measure suggested in the cluster-based outlier detection method given by He et al. 2003. The experimental results show that our approach is much more efficient than the HOTSAX algorithm in detecting time series discords while the anomalous patterns discovered by the two methods perfectly match with each other.

A fast algorithm for complex discord searches in time series: HOT SAX Time

Applied Intelligence

Time series analysis is quickly proceeding towards long and complex tasks. In recent years, fast approximate algorithms for discord search have been proposed in order to compensate for the increasing size of the time series. It is more interesting, however, to find quick exact solutions. In this research, we improved HOT SAX by exploiting two main ideas: the warm-up process, and the similarity between sequences close in time. The resulting algorithm, called HOT SAX Time (HST), has been validated with real and synthetic time series, and successfully compared with HOT SAX, RRA, SCAMP, and DADD. The complexity of a discord search has been evaluated with a new indicator, the cost per sequence (cps), which allows one to compare searches on time series of different lengths. Numerical evidence suggests that two conditions are involved in determining the complexity of a discord search in a non-trivial way: the length of the discords, and the noise/signal ratio. In the case of complex searches, HST can be more than 100 times faster than HOT SAX, thus being at the forefront of the exact discord search.

DETECTING THE PERIODIC OUTLIER PATTERN USING TIME SERIES SEQUENCES

IJRCAR, 2014

Detecting the periodicity of outlier patterns might be more important in many sequences than the periodicity of regular, more frequent patterns. In this paper, Ipresent the development of a enhanced suffix array-based time efficient algorithm for unusual or outlier periodic patterns. The development of a mathematical model to measure how unusual or surprising a pattern is compared with other patterns in the same data sequence;The ability to detect periodic patterns that appear in a subsection of the series and extensive comparative experimental evaluation of various aspects of the algorithm has been conducted, and it has been favorably compared with Info Miner

IJERT-A Survey on Periodicity Detection Techniques in Time Series Databases

International Journal of Engineering Research and Technology (IJERT), 2013

https://www.ijert.org/a-survey-on-periodicity-detection-techniques-in-time-series-databases https://www.ijert.org/research/a-survey-on-periodicity-detection-techniques-in-time-series-databases-IJERTV2IS121040.pdf In recent years, periodic patterns are gaining much importance, so various periodicity detection algorithms were developed. Time series database is a collection of data gathered at certain intervals to reflect certain behaviour of an entity. By analysing time series database we can find how frequent a particular pattern is present and the number of occurrences can be counted. Temporal regularity of a pattern can be found using periodic pattern mining technique. Periodic pattern mining can be used to find periodicity of many real life problems and can be used for prediction. Various periodicity determining algorithms are compared.

Efficient discovery of unusual patterns in time series

2006

Abstract The problem of finding a specified pattern in a time series database (ie, query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of “surprise”, and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets.

AN ALGORITHM FOR DISCOVERING SIMILAR SUBSEQUENCES IN TIME SERIES DATA USING CID (Complexity – Invariant Distance)

Abstract Discovering subsequences (motifs) in time series data has attracted the interest of researchers. Numerous algorithms, which use distance function or other (dis) similarity measure between two time series, have been proposed during these developments. We present an algorithm to detect subsequence (of length m) which is mostly repeated in a time series (of length n). Detecting repeated subsequence in time series is done dynamically by assigned the length (m) of the subsequence. The value of m is selected by the user according to some characteristics of time series (eg seasonality, periodicity, etc) or from a previous detailed analysis of that time series. The algorithm allows the user to choose between two (dis)similarity measures. The (dis)similarity is examined on two measures: Euclidean distance and CID (Complexity- Invariant Distance, proposed by Batista G. and Keogh E. (2013)). The proposed algorithm is tested on real world time series data and simulated time series in R...

Faster and parameter-free discord search in quasi-periodic time series (original) (raw)

Related papers