AN ALGORITHM FOR DISCOVERING SIMILAR SUBSEQUENCES IN TIME SERIES DATA USING CID (Complexity – Invariant Distance) (original) (raw)

Abstract Discovering subsequences (motifs) in time series data has attracted the interest of researchers. Numerous algorithms, which use distance function or other (dis) similarity measure between two time series, have been proposed during these developments. We present an algorithm to detect subsequence (of length m) which is mostly repeated in a time series (of length n). Detecting repeated subsequence in time series is done dynamically by assigned the length (m) of the subsequence. The value of m is selected by the user according to some characteristics of time series (eg seasonality, periodicity, etc) or from a previous detailed analysis of that time series. The algorithm allows the user to choose between two (dis)similarity measures. The (dis)similarity is examined on two measures: Euclidean distance and CID (Complexity- Invariant Distance, proposed by Batista G. and Keogh E. (2013)). The proposed algorithm is tested on real world time series data and simulated time series in R...