Rare Time Series Motif Discovery from Unbounded Streams (original) (raw)

Exact Discovery of Time Series Motifs

Time series motifs are pairs of individual time series, or subsequences of a longer time series, which are very similar to each other. As with their discrete analogues in computational biology, this similarity hints at structure which has been conserved for some reason and may therefore be of interest. Since the formalism of time series motifs in 2002, dozens of researchers have used them for diverse applications in many different domains. Because the obvious algorithm for computing motifs is quadratic in the number of items, more than a dozen approximate algorithms to discover motifs have been proposed in the literature. In this work, for the first time, we show a tractable exact algorithm to find time series motifs. As we shall show through extensive experiments, our algorithm is up to three orders of magnitude faster than brute-force search in large datasets. We further show that our algorithm is fast enough to be used as a subroutine in higher level data mining algorithms for anytime classification, near-duplicate detection and summarization, and we consider detailed case studies in domains as diverse as electroencephalograph interpretation and entomological telemetry data mining.

Detecting time series motifs under uniform scaling

2007

Time series motifs are approximately repeated patterns found within the data. Such motifs have utility for many data mining algorithms, including rule-discovery, novelty-detection, summarization and clustering. Since the formalization of the problem and the introduction of efficient linear time algorithms, motif discovery has been successfully applied to many domains, including medicine, motion capture, robotics and meteorology.

Variable length motif discovery in time series data

IEEE Access, 2023

The detection of recurring behavioral patterns in time series data, also called motif discovery, is a crucial step for mining insights in complex time series data, especially in complex environments where manual monitoring is not feasible. However, current state-of-the-art algorithms fall short in their applicability in production environments (due to static motif length, lots of user defined parameters, only providing the best motif pair, etc.). In this paper, a variable length motif discovery method is proposed based on the Matrix Profile which focuses on industrial applicability. It works in noisy and periodic environments, returns only unique motifs (meaning same shape motifs are grouped together as one) and only requires one distance matrix calculation. The method was benchmarked on synthetic data as well as publicly available real world key performance indicator (KPI) data from telecom providers and shows adequate accuracy in finding both short and long motifs in the same time series.

A novel clustering-based method for time series motif discovery under time warping measure

International Journal of Data Science and Analytics, 2017

The problem of time series motif discovery has attracted a lot of attention and is useful in many real-world applications. However, most of the proposed methods so far use Euclidean distance to deal with this problem. There has been one proposed method, called MDTW_WedgeTree, for time series motif discovery under DTW distance. But this method aims to deal with the case in which motif is the time series in a time series database which has the highest count of its similar time series within a range r. To adapt the above-mentioned method to the case in which motifs are frequently occurring subsequences of a longer time series, we modify MDTW_WedgeTree to a new algorithm for discovering "subsequence" motifs in time series under DTW. The proposed method consists of a segmentation method to divide the time series into motif candidates and a BIRCHbased clustering which can efficiently cluster motif candidate subsequences under DTW distance. Experimental results showed that our proposed method for discovering "subsequence" motifs performs very efficiently on large time series datasets while brings out high accuracy.

Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery

Seventh IEEE International Conference on Data Mining (ICDM 2007), 2007

Discovering recurring patterns in time series data is a fundamental problem for temporal data mining. This paper addresses the problem of locating subdimensional motifs in real-valued, multivariate time series, which requires the simultaneous discovery of sets of recurring patterns along with the corresponding relevant dimensions. While many approaches to motif discovery have been developed, most are restricted to categorical data, univariate time series, or multivariate data in which the temporal patterns span all of the dimensions. In this paper, we present an expected linear-time algorithm that addresses a generalization of multivariate pattern discovery in which each motif may span only a subset of the dimensions. To validate our algorithm, we discuss its theoretical properties and empirically evaluate it using several data sets including synthetic data and motion capture data collected by an on-body inertial sensor.

Discriminative Motif Analysis for Time Series Classification

2018

Time series classification is one of the major works in data mining community. Classification generally works on original real-valued time series or transformed time series. The main issue of time series classification is computational complexity handling with massive amount of time series. In this work, time series are classified using motifs as feature vectors. Variable length motifs are discovered on symbolic representation with our proposed positional inverted index. We further investigated on the candidate motifs with Information Gain (IG) measure for its discriminative features. Then, the classification accuracy of motifs with and without its discriminative features on UCR benchmark datasets are analyzed. As experimental evaluation, motif with its discriminative features achieved on 7 out of 11 datasets. Keyword Motif discovery, Time series classification, Symbolic representation

Constrained motif discovery in time series

New Generation Computing, 2009

The goal of motif discovery algorithms is to efficiently find unknown recurring patterns. In this paper we focus on motif discovery in time series. Most available algorithms cannot utilize domain knowledge in any way which results in quadratic or at least super-linear time and space complexity. In this paper we define the Constrained Motif Discovery problem which enables utilization of domain knowledge into the motif discovery process. The paper then provides two algorithms called MCFull and MCInc for efficiently solving the constrained motif discovery problem. We also show that most unconstrained motif discovery problems be converted into constrained ones using a change-point detection algorithm. A novel change-point detection algorithm called the Robust Singular Spectrum Transform (RSST) is then introduced and compared to traditional Singular Spectrum Transform using synthetic and real-world data sets. The results show that RSST achieves higher specificity and is more adequate for finding constraints to convert unconstrained motif discovery problems to constrained ones that can be solved using MCFull and MCInc. We then compare the combination of RSST and MCFull or MCInc with two state-of-the-art motif discovery algorithms on a large set of synthetic time series. The results show that the proposed algorithms provided four to ten folds increase in speed compared the unconstrained motif discovery algorithms studied without any loss of accuracy. RSST+MCFull is then used in a real world human-robot interaction experiment to enable the robot to learn free hand gestures, actions, and their associations by watching humans and other robots interacting.

Motif and anomaly discovery of time series based on subseries join

Time series motifs are repeated similar subseries in one or multiple time series data. Time series anomalies are unusual subseries in one or multiple time series data. Finding motifs and anomalies in time series data are closely related problems and are useful in many domains, including medicine, motion capture, meteorology, and finance. This work presents a novel approach for both the motif discovery problem and the anomaly detection problem. This approach first uses subseries join to obtain the similarity relationships among subseries of the time series data. Then the motif discovery and anomaly detection problems can be converted to graph-theoretic problems solvable by existing graphtheoretic algorithms. Experiments demonstrate the effectiveness of the proposed approach to discover motifs and anomalies in real-world time series data. Experiments also demonstrate that the proposed approach is efficient to process large time series datasets.

Ranking and significance of variable-length similarity-based time series motifs

The detection of very similar patterns in a time series, commonly called motifs, has received continuous and increasing attention from diverse scientific communities. In particular, recent approaches for discovering similar motifs of different lengths have been proposed. In this work, we show that such variable-length similarity-based motifs cannot be directly compared, and hence ranked, by their normalized dissimilarities. Specifically, we find that length-normalized motif dissimilarities still have intrinsic dependencies on the motif length, and that lowest dissimilarities are particularly affected by this dependency. Moreover, we find that such dependencies are generally non-linear and change with the considered data set and dissimilarity measure. Based on these findings, we propose a solution to rank those motifs and measure their significance. This solution relies on a compact but accurate model of the dissimilarity space, using a beta distribution with three parameters that de...

Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle

Machine Learning, 2005

Recently, the research on efficient extraction of previously unknown, frequently appearing patterns in a time-series data has received much attention. These patterns are called 'motifs'. Motifs are useful for various time-series data mining tasks. In this paper, we propose a motif discovery algorithm to extract a motif that represents a characteristic pattern of the given data based on Minimum Description Length (MDL) principle. In addition, the algorithm can extract motifs from multi-dimensional time-series data by using Principal Component Analysis (PCA). In experimental evaluation, we show the efficiency of the motif discovery algorithm, and the usefulness of extracted motifs to various data mining tasks.