Boosting interval based literals1 (original) (raw)
Related papers
Time Series Classification by Boosting Interval Based Literals
A supervised classification method for temporal series, even multivariate, is presented. It is based on boosting very simple classifiers: clauses with one literal in the body. The background predicates are based on temporal intervals. Two types of predicates are used: i) relative predicates, such as "increases" and "stays", and ii) region predicates, such as "always" and "sometime", which operate over regions in the dominion of the variable. Experiments on different data sets, several of them obtained from the UCI repositories, show that the proposed method is highly competitive with previous approaches.
Fast and Accurate Time Series Classification Through Supervised Interval Search
2020 IEEE International Conference on Data Mining (ICDM), 2020
Time series classification (TSC) aims to predict the class label of a given time series. Modern applications such as appliance modelling require to model an abundance of long time series, which makes it difficult to use many state-ofthe-art TSC techniques due to their high computational cost and lack of interpretable outputs. To address these challenges, we propose a novel TSC method: the Supervised Time Series Forest (STSF). STSF improves the classification efficiency by examining only a (set of) sub-series of the original time series, and its tree-based structure allows for interpretable outcomes. STSF adapts a top-down approach to search for relevant subseries in three different time series representations prior to training any tree classifier, where the relevance of a sub-series is measured by feature ranking metrics (i.e., supervision signals). Experiments on extensive real datasets show that STSF achieves comparable accuracy to state-of-the-art TSC methods while being significantly more efficient, enabling TSC for long time series.
Time series classification: Decision forests and SVM on interval and DTW features
2007
This paper describes the methods used for our submission to the KDD 2007 Challenge on Time Series Classification. For each dataset we selected from a pool of methods (individual classifiers or classifier ensembles) using cross validation (CV). Three types of classifiers were considered: nearest neighbour (using DTW), support vector machines (linear and perceptron kernel) and decision forests (boosting, random forest, rotation forest and random oracles). SVM and decision forests used extracted features of two types: similarity-based and interval-based. Two feature selection approaches were applied: FCBF or SVM-RFE. Where the minimum CV errors of several classifiers tied, the labels were assigned through majority vote. We report results with both the practice and the contest data.
Interval and dynamic time warping-based decision trees
2004
This work presents decision trees adequate for the classification of series data. There are several methods for this task, but most of them focus on accuracy. One of the requirements of data mining is to produce comprehensible models. Decision trees are one of the most comprehensible classifiers. The use of these methods directly on this kind of data is, generally, not adequate, because complex and inaccurate classifiers are obtained. Hence, instead of using the raw features, new ones are constructed.
Classification of multivariate time series via temporal abstraction and time intervals mining
Knowledge and Information Systems, 2014
Classification of multivariate time series data, often including both time points and intervals at variable frequencies, is a challenging task. We introduce the KarmaLegoSification (KLS) framework for classification of multivariate time series analysis, which implements three phases: (1) application of a temporal abstraction process that transforms a series of raw time-stamped data points into a series of symbolic time intervals; (2) mining these symbolic time intervals to discover frequent time-intervals-relatedpatterns (TIRPs), using Allen's temporal relations; and (3) using the TIRPs as features to induce a classifier. To efficiently detect multiple TIRPs (features) in a single entity to be classified, we introduce a new algorithm, SingleKarmaLego (SKL), which can be shown to be superior for that purpose over a Sequential TIRPs Detection (STD) algorithm. We evaluated the KLS framework on datasets in the domains of diabetes, intensive care, and infectious hepatitis, assessing the effects of the various settings of the KLS framework. Discretization using SAX led to better performance than using the Equal-Width Discretization (EWD); Knowledge-Based (KB) cut off definitions when available, was superior to both. Using three abstract temporal relations was superior to using the seven core temporal relations. Using an epsilon value larger than zero tended to result in a slightly better accuracy when using the SAX discretization method, but resulted in a reduced accuracy when using EWD, and overall, does not seem beneficial. No feature selection method we tried proved useful. Regarding feature (TIRP) representation, Mean Duration performed better than Horizontal Support, which in turn performed better than the default Binary (existence) representation method.
Classification of multiple time-series via boosting
Much of modern machine learning and statistics research consists of extracting information from high-dimensional patterns. Often times, the large number of features that comprise this high-dimensional pattern are themselves vector valued, corresponding to sampled values in a time-series. Here, we present a classification methodology to accommodate multiple time-series using boosting. This method constructs an additive model by adaptively selecting basis functions consisting of a discriminating feature's full timeseries. We present the necessary modifications to Fisher Linear Discriminant Analysis and Least-Squares, as base learners, to accommodate the weighted data in the proposed boosting procedure. We conclude by presenting the performance of our proposed method against a synthetic stochastic differential equation data set and a real world data set involving prediction of cancer patient susceptibility for a particular chemoradiotherapy.
WINkNN: Windowed Intervals’ Number kNN Classifier for Efficient Time-Series Applications
Mathematics
Our interest is in time series classification regarding cyber–physical systems (CPSs) with emphasis in human-robot interaction. We propose an extension of the k nearest neighbor (kNN) classifier to time-series classification using intervals’ numbers (INs). More specifically, we partition a time-series into windows of equal length and from each window data we induce a distribution which is represented by an IN. This preserves the time dimension in the representation. All-order data statistics, represented by an IN, are employed implicitly as features; moreover, parametric non-linearities are introduced in order to tune the geometrical relationship (i.e., the distance) between signals and consequently tune classification performance. In conclusion, we introduce the windowed IN kNN (WINkNN) classifier whose application is demonstrated comparatively in two benchmark datasets regarding, first, electroencephalography (EEG) signals and, second, audio signals. The results by WINkNN are supe...
Interval-valued feature selection
2011
In this paper we introduce the use of interval variables in classification problems of time series signals. By introducing the concept of interval kernel as a similarity measure among intervals, modifications for some well-known feature selection methods are developed in order to apply these methods to select the most relevant interval variables. A comparison against standard point attributes feature selection (Relief and FSDD) is made for purposes of validation.