W-TSS: A Wavelet-Based Algorithm for Discovering Time Series Shapelets (original) (raw)

Time series shapelets: a novel technique that allows accurate, interpretable and fast classification

Data Mining and Knowledge Discovery, 2010

Classification of time series has been attracting great interest over the past decade. While dozens of techniques have been introduced, recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems, especially for large-scale datasets. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a high time and space complexity that limits its applicability, especially on resource-limited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data and to make the classification result more explainable, which global characteristics of the nearest neighbor cannot provide. In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. We can use the distance to the shapelet, rather than the distance to the nearest neighbor to classify objects. As we shall show with extensive empirical evaluations in diverse domains, classification algorithms based on the time series shapelet primitives can be interpretable, more accurate, and significantly faster than state-of-the-art classifiers.

Efficient Shapelet Discovery for Time Series Classification

IEEE Transactions on Knowledge and Data Engineering, 2020

Time-series shapelets are discriminative subsequences, recently found effective for time series classification (TSC). It is evident that the quality of shapelets is crucial to the accuracy of TSC. However, major research has focused on building accurate models from some shapelet candidates. To determine such candidates, existing studies are surprisingly simple, e.g., enumerating subsequences of some fixed lengths, or randomly selecting some subsequences as shapelet candidates. The major bulk of computation is then on building the model from the candidates. In this paper, we propose a novel efficient shapelet discovery method, called BSPCOVER, to discover a set of high-quality shapelet candidates for model building. Specifically, BSPCOVER generates abundant candidates via Symbolic Aggregate approXimation with sliding window, then prunes identical and highly similar candidates via Bloom filters, and similarity matching, respectively. We next propose a p-Cover algorithm to efficiently determine discriminative shapelet candidates that maximally represent each time-series class. Finally, any existing shapelet learning method can be adopted to build a classification model. We have conducted extensive experiments with well-known time-series datasets and representative state-of-the-art methods. Results show that BSPCOVER speeds up the state-of-the-art methods by more than 70 times, and the accuracy is often comparable to or higher than existing works.

A Shapelet Transform for Time Series Classification

Proceedings of the 18th ACM …, 2012

The problem of time series classification (TSC), where we consider any real-valued ordered data a time series, presents a specific machine learning challenge as the ordering of variables is often crucial in finding the best discriminating features. One of the most promising recent approaches is to find shapelets within a data set. A shapelet is a time series subsequence that is identified as being representative of class membership. The original research in this field embedded the procedure of finding shapelets within a decision tree. We propose disconnecting the process of finding shapelets from the classification algorithm by proposing a shapelet transformation. We describe a means of extracting the k best shapelets from a data set in a single pass, and then use these shapelets to transform data by calculating the distances from a series to each shapelet. We demonstrate that transformation into this new data space can improve classification accuracy, whilst retaining the explanatory power provided by shapelets.

Classification of time series by shapelet transformation

Your article is protected by copyright and all rights are held exclusively by The Author(s). This e-offprint is for personal use only and shall not be self-archived in electronic repositories. If you wish to self-archive your article, please use the accepted manuscript version for posting on your own website. You may further deposit the accepted manuscript version in any repository, provided it is only made publicly available 12 months after official publication or later and provided acknowledgement is given to the original source of publication and a link is inserted to the published article on Springer's website. The link must be accompanied by the following text: "The final publication is available at link.springer.com".

Pattern Sampling for Shapelet-based Time Series Classification

ArXiv, 2021

Subsequence-based time series classification algorithms provide accurate and interpretable models, but training these models is extremely computation intensive. The asymptotic time complexity of subsequence-based algorithms remains a higher-order polynomial, because these algorithms are based on exhaustive search for highly discriminative subsequences. Pattern sampling has been proposed as an effective alternative to mitigate the pattern explosion phenomenon. Therefore, we employ pattern sampling to extract discriminative features from discretized time series data. A weighted trie is created based on the discretized time series data to sample highly discriminative patterns. These sampled patterns are used to identify the shapelets which are used to transform the time series classification problem into a feature-based classification problem. Finally, a classification model can be trained using any off-the-shelf algorithm. Creating a pattern sampler requires a small number of patterns...

ShapeNet: A Shapelet-Neural Network Approach for Multivariate Time Series Classification

Proceedings of the AAAI Conference on Artificial Intelligence

Time series shapelets are short discriminative subsequences that recently have been found not only to be accurate but also interpretable for the classification problem of univariate time series (UTS). However, existing work on shapelets selection cannot be applied to multivariate time series classification (MTSC) since the candidate shapelets of MTSC may come from different variables of different lengths and thus cannot be directly compared. To address this challenge, in this paper, we propose a novel model called ShapeNet, which embeds shapelet candidates of different lengths into a unified space for shapelet selection. The network is trained using cluster-wise triplet loss, which considers the distance between anchor and multiple positive (negative) samples and the distance between positive (negative) samples, which are important for convergence. We compute representative and diversified final shapelets rather than directly using all the embeddings for model building to avoid a la...

Shapelet Ensemble for Multi-dimensional Time Series

Proceedings of the 2015 SIAM International Conference on Data Mining, 2015

Time series shapelets are small subsequences that maximally differentiate classes of time series. Since the inception of shapelets, researchers have used shapelets for various data domains including anthropology and health care, and in the process suggested many efficient techniques for shapelet discovery. However, multi-dimensional time series data poses unique challenges to shapelet discovery that are yet to be solved. We show that an ensemble of shapelet-based decision trees on individual dimensions works better than shapelets defined over multiple dimensions. Generating a shapelet ensemble for multidimensional time series is computationally expensive. Most of the existing techniques prune shapelet candidates for speed. In this paper, we propose a novel technique for shapelet discovery that evaluates remaining candidates efficiently. Our algorithm uses a multi-length approximate index for time series data to efficiently find the nearest neighbors of the candidate shapelets. We employ a simple skipping technique for additional candidate pruning and a voting based technique to improve accuracy while retaining interpretability. Not only do we find a significant speed increase, our techniques enable us to efficiently discover shapelets on datasets with multi-dimensional and long time series such as hours of brain activity recordings. We demonstrate our approach on a biomedical dataset and find significant differences between patients with schizophrenia and healthy controls.

Ensembles of Randomized Time Series Shapelets Provide Improved Accuracy while Reducing Computational Costs

ArXiv, 2017

Shapelets are discriminative time series subsequences that allow generation of interpretable classification models, which provide faster and generally better classification than the nearest neighbor approach. However, the shapelet discovery process requires the evaluation of all possible subsequences of all time series in the training set, making it extremely computation intensive. Consequently, shapelet discovery for large time series datasets quickly becomes intractable. A number of improvements have been proposed to reduce the training time. These techniques use approximation or discretization and often lead to reduced classification accuracy compared to the exact method. We are proposing the use of ensembles of shapelet-based classifiers obtained using random sampling of the shapelet candidates. Using random sampling reduces the number of evaluated candidates and consequently the required computational cost, while the classification accuracy of the resulting models is also not s...

Visualet: Visualizing Shapelets for Time Series Classification

2020

Time series classification (TSC) has attracted considerable attention from both academia and industry. TSC methods that are based on shapelets (intuitively, small highly-discriminative subsequences have been found effective and are particularly known for their interpretability, as shapelets themselves are subsequences. A recent work has significantly improved the efficiency of shapelet discovery. For instance, the shapelets of more than 65% of the datasets in the UCR Archive (containing data from different application domains) can be computed within an hour, whereas those of 12 datasets can be computed within a minute. Such efficiency has made it possible for demo attendees to interact with shapelet discovery and explore high-quality shapelets. In this demo, we present Visualet -- a tool for visualizing shapelets, and exploring effective and interpretable ones.

Discrete wavelet transform-based time series analysis and mining

ACM Computing Surveys, 2011

Time series are recorded values of an interesting phenomenon such as stock prices, household incomes, or patient heart rates over a period of time. Time series data mining focuses on discovering interesting patterns in such data. This article introduces a wavelet-based time series data analysis to interested readers. It provides a systematic survey of various analysis techniques that use discrete wavelet transformation (DWT) in time series data mining, and outlines the benefits of this approach demonstrated by previous studies performed on diverse application domains, including image classification, multimedia retrieval, and computer network anomaly detection.