Dynamic Time Warping Research Papers (original) (raw)
This paper describes the development and validation of an Embedded Isolated Word Recognition System (IWR) for the Argentinian Spanish language, implemented on the STM32F4-Discovery platform. Its front-end extracts Mel Frequency Cepstral... more
This paper describes the development and validation of an Embedded Isolated Word Recognition System (IWR) for the Argentinian Spanish language, implemented on the STM32F4-Discovery platform. Its front-end extracts Mel Frequency Cepstral Coefficients (MFCC), while its classification step is based on the Dynamic Time Warping (DTW) algorithm. Since the system was conceived as a base platform for the research and development of speech-based command and control applications, it was designed to be modular and to meet real-time performance. The system includes a Real Time Operating System (RTOS) to manage various processing and control tasks, which can be easily reconfigured with different acquisition, processing and recognition parameters using a single file. The validation was done using a scenario of robotic control, achieving performance rates which demonstrates the practical usefulness of the system.
Cel pracy: 1) zbadanie wpływu różnorodnych czynników na skuteczność i szybkość rozpoznawania izolowanych słów mowy polskiej; 2) opracowanie efektywnej metody rozpoznawania izolowanych słów polskich. Zbadano wpływ częstotliwości... more
Cel pracy: 1) zbadanie wpływu różnorodnych czynników na skuteczność i szybkość rozpoznawania izolowanych słów mowy polskiej; 2) opracowanie efektywnej metody rozpoznawania izolowanych słów polskich. Zbadano wpływ częstotliwości próbkowania sygnału mowy oraz liczby poziomów kwantyzacji na efektywność niezależnego i zależnego od mówcy rozpoznawania izolowanych wypowiedzi polskich z bazy CORPORA za pomocą metody nieliniowej transformacji czasowej (DTW) oraz niejawnych modeli Markowa HMM. Wykazano istnienie optimum doboru częstotliwości próbkowania oraz liczby poziomów kwantyzacji. Zbadano również częściowo zależność rozpoznawania izolowanych wypowiedzi od płci mówcy, wielkości zbioru uczącego oraz wybranych parametrów niejawnych modeli Markowa. Podniesiono skuteczność rozpoznawania izolowanych wypowiedzi polskich za pomocą metody nieliniowej transformacji czasowej poprzez zastosowanie wstępnej klasyfikacji za pomocą: 1) znaczników fonetycznych; 2) niejawnych modeli Markowa.
The purpose of this study was to assess the performance of a real-time ("open-end") version... more
The purpose of this study was to assess the performance of a real-time ("open-end") version of the dynamic time warping (DTW) algorithm for the recognition of motor exercises. Given a possibly incomplete input stream of data and a reference time series, the open-end DTW algorithm computes both the size of the prefix of reference which is best matched by the input, and the dissimilarity between the matched portions. The algorithm was used to provide real-time feedback to neurological patients undergoing motor rehabilitation. We acquired a dataset of multivariate time series from a sensorized long-sleeve shirt which contains 29 strain sensors distributed on the upper limb. Seven typical rehabilitation exercises were recorded in several variations, both correctly and incorrectly executed, and at various speeds, totaling a data set of 840 time series. Nearest-neighbour classifiers were built according to the outputs of open-end DTW alignments and their global counterparts on exercise pairs. The classifiers were also tested on well-known public datasets from heterogeneous domains. Nonparametric tests show that (1) on full time series the two algorithms achieve the same classification accuracy (p-value =0.32); (2) on partial time series, classifiers based on open-end DTW have a far higher accuracy (kappa=0.898 versus kappa=0.447;p<10(-5)); and (3) the prediction of the matched fraction follows closely the ground truth (root mean square <10%). The results hold for the motor rehabilitation and the other datasets tested, as well. The open-end variant of the DTW algorithm is suitable for the classification of truncated quantitative time series, even in the presence of noise. Early recognition and accurate class prediction can be achieved, provided that enough variance is available over the time span of the reference. Therefore, the proposed technique expands the use of DTW to a wider range of applications, such as real-time biofeedback systems.
Abstract. The recognition of manual actions, ie, hand movements, hand postures and gestures, plays an important role in human-computer interaction, while belonging to a category of particularly difficult tasks. Using a Vicon system to... more
Abstract. The recognition of manual actions, ie, hand movements, hand postures and gestures, plays an important role in human-computer interaction, while belonging to a category of particularly difficult tasks. Using a Vicon system to capture 3D spatial data, we ...
In recent years, dynamic time warping (DTW) has begun to become the most widely used technique for comparison of time series data where extensive a priori knowledge is not available. However, it is often expected a multivariate comparison... more
In recent years, dynamic time warping (DTW) has begun to become the most widely used technique for comparison of time series data where extensive a priori knowledge is not available. However, it is often expected a multivariate comparison method to consider the correlation between the variables as this cor-relation carries the real information in many cases. Thus, principal component analysis (PCA) based sim-ilarity measures, such as PCA similarity factor (SPCA), are used in many industrial applications. In this paper, we present a novel algorithm called correlation based dynamic time warping (CBDTW) which combines DTW and PCA based similarity measures. To preserve correlation, multivariate time ser-ies are segmented and the local dissimilarity function of DTW originated from SPCA. The segments are obtained by bottom-up segmentation using special, PCA related costs. Our novel technique qualified on two databases, the database of signature verification competition 2004 and the commonly used AUSLAN dataset. We show that CBDTW outperforms the standard SPCA and the most commonly used, Euclidean distance based multivariate DTW in case of datasets with complex correlation structure.
This study investigates the contribution of the temporal patterning of speech to the reduced intelligibility of foreign-accented utterances. Short English phrases spoken by a native Chinese speaker were instrumentally modified, using LPC... more
This study investigates the contribution of the temporal patterning of speech to the reduced intelligibility of foreign-accented utterances. Short English phrases spoken by a native Chinese speaker were instrumentally modified, using LPC resynthesis and dynamic time warping, so as to align the duration of acoustic segments with tokens of the same phrases spoken by a native English speaker, while retaining the spectral and source characteristics of the Chinese speaker. Similarly, the native speaker’s productions were distorted to match the durational patterns of the non-native speaker. Intelligibility of these stimuli was measured, based on native English listeners’ performance in a forced-choice identification test with four alternatives: the correct phrase plus three phonetically similar distractor phrases suggested by listening to the Chinese productions. Intelligibility of the unmodified Chinese-accented phrases was poor (39% correct), but improved significantly (to 58%) after temporal correction. Performance on the native productions was high (94%), but declined significantly (to 83%) after temporal distortion according to the Chinese speaker’s timing. These results suggest that intelligibility of foreign-language speakers may be enhanced if explicit training is provided on temporal properties of their speech.
This paper presents a system for speaker independent keyword spotting (KWS) in continuous speech using a spoken example template. The approach, based on Dynamic Time Warping (DTW) for matching the template to a test utterance, does not... more
This paper presents a system for speaker independent keyword spotting (KWS) in continuous speech using a spoken example template. The approach, based on Dynamic Time Warping (DTW) for matching the template to a test utterance, does not require any modelling or training as required in alternative techniques such as the Hidden Markov Model (HMM). This is of particular relevance to applications such as detection of words that have not been adequately represented in a training database (e.g. searching for topical words that are emerging in society). Introduced is the use of the DTW distance histogram for automatic estimation of similarity thresholds for every keyword-utterance pair. Experiments conducted on a wide range of speech sentences and keywords show that when only a few examples of the keyword are available, the proposed system has higher recall ratio than a HMM-based approach.
Dynamic Time Warping (DTW), a pattern matching technique traditionally used for restricted vocabulary speech recognition, is based on a temporal alignment of the input signal with the template models. The principal drawback of DTW is its... more
Dynamic Time Warping (DTW), a pattern matching technique traditionally used for restricted vocabulary speech recognition, is based on a temporal alignment of the input signal with the template models. The principal drawback of DTW is its high computational cost as the lengths of the signals increase. This paper shows extended results over our previously published conference paper, which introduces an
This paper describes preliminary research results on speech recognition of speech impaired children. A several Polish phonemes most confusing for speech impaired children were investigated. The records included utterances being the... more
This paper describes preliminary research results on speech recognition of speech impaired children. A several Polish phonemes most confusing for speech impaired children were investigated. The records included utterances being the examples of pathological speech. Part of the recorded material was artificially noised by procedure generating white noise. Two most promising types of cepstral coefficients: standard (MFCC) as well as human factor (HFCC) were used for tracking of speech content in frequency domain. For mispronounced phoneme recognition embedded in the word a classical dynamic time warping (DTW) algorithm as well as HMM method were exploited. A phoneme-based approach in DTW method has been proposed. Optimal parameters of HFCC adjusted to the stated recognition task have been found. The superior HFCC performance during conducted recognition experiments especially in strongly noised environment has been observed. The results of the research can be useful for modern logopedic therapy.
A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action... more
A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.
Gesture recognition is a technology often used in human-computer interaction applications. Dynamic time warping (DTW) is one of the techniques used in gesture recognition to find an optimal alignment between two sequences. Oftentimes a... more
Gesture recognition is a technology often used
in human-computer interaction applications. Dynamic
time warping (DTW) is one of the techniques used in
gesture recognition to find an optimal alignment between
two sequences. Oftentimes a pre-processing of
sequences is required to remove variations due to different
camera or body orientations or due to different
skeleton sizes between the reference gesture sequences
and the test gesture sequences. We discuss a set of
pre-processing methods to make the gesture recognition
mechanism robust to these variations. DTW computes
a dissimilarity measure by time-warping the sequences
on a per sample basis by using the distance between the
current reference and test sequences. However, all body
joints involved in a gesture are not equally important
in computing the distance between two sequence samples.
We propose a weighted DTW method that weights
joints by optimizing a discriminant ratio. Finally, we
demonstrate the performance of our pre-processing and
the weighted DTW method and compare our results
with the conventional DTW and state-of-the-art.
This paper describes the development and validation of an Embedded Isolated Word Recognition System (IWR) for the Argentinian Spanish language, implemented on the STM32F4-Discovery platform. Its front-end extracts Mel Frequency Cepstral... more
This paper describes the development and validation of an Embedded Isolated Word Recognition System (IWR) for the Argentinian Spanish language, implemented on the STM32F4-Discovery platform. Its front-end extracts Mel Frequency Cepstral Coefficients (MFCC), while its classification step is
based on the Dynamic Time Warping (DTW) algorithm. Since the system was conceived as a base platform for the research and development of speech-based command and control applications, it was designed to be modular and to meet real-time performance. The system includes a Real Time Operating System (RTOS) to manage various processing and
control tasks, which can be easily reconfigured with different acquisition, processing and recognition parameters using a single file. The validation was done using a scenario of robotic control, achieving performance rates which demonstrates the practical usefulness of the system.
AbstractElectrocardiogram (ECG) segmentation is necessary to help reduce the time consuming task of manually annotating ECG's. Several algorithms have been developed to segment the ECG automatically. We first review several of such... more
AbstractElectrocardiogram (ECG) segmentation is necessary to help reduce the time consuming task of manually annotating ECG's. Several algorithms have been developed to segment the ECG automatically. We first review several of such methods, and then present a new ...
Proteomics is the study of the abundance, function and dynamics of all proteins present in a living organism, and mass spectrometry (MS) has become its most important tool due to its unmatched sensitivity, resolution and potential for... more
Proteomics is the study of the abundance, function and dynamics of all proteins present in a living organism, and mass spectrometry (MS) has become its most important tool due to its unmatched sensitivity, resolution and potential for high-throughput experimentation. A frequently used variant of mass spectrometry is coupled with liquid chromatography (LC) and is denoted as "LC/MS". It produces two-dimensional
The assessment of subtle morphological changes in noisy signals is a common challenge in the field of biomedical signal processing. Concerning the electrocardiogram (ECG), it may yield novel risk factors for cardiac mortality. Here, we... more
The assessment of subtle morphological changes in noisy signals is a common challenge in the field of biomedical signal processing. Concerning the electrocardiogram (ECG), it may yield novel risk factors for cardiac mortality. Here, we describe an iterative two-dimensional signal warping algorithm (i2DSW), which morphological analyses even in case of noise ratios. i2DSW adapts a generalized iterative template adaptation process that yields a more flexible template and allows for better fitting of subtle variations of signal shapes. Moreover, the template segmentation is not dependent on signal morphology. We test its performance, by measuring beat-to-beat repolarization variability in simulated and clinical ECG. Simulation studies show higher robustness of i2DSW in presence of typical ECG artefacts compared to previously proposed methods including the existing two-dimensional warping technique (26% improvement). Comparison of short-term ECG recorded in normal subjects versus patients with myocardial infarction (MI) confirmed increased repolarization variability in MI patients (p < 0.0001). Results obtained with long-term ECG show improved waveform adaptation of i2DSW (overall 19%, up to 33%). The assessment of subtle morphological changes by i2DSW may yield novel and more robust risk factors for cardiac mortality. By avoiding a fixed template segmentation, the generalized design of i2DSW has the potential to be also powerful in the application to other quasi-periodic signals.
Stephen's mathematical approach is based upon the behaviour of anti-matter and matter collisions at the event horizon. Surprisingly I discover it can be co-opted into my theory of MEeM. My later papers show how the event horizon allows... more
Stephen's mathematical approach is based upon the behaviour of anti-matter and matter collisions at the event horizon. Surprisingly I discover it can be co-opted into my theory of MEeM.
My later papers show how the event horizon allows the escape of Hawking Radiation and another the creation of anti-matter at the event horizon (which supports Stephen Hawking's view but involves the warping of space).
The inclusion of Kerr Event Horizons shows how important Einstein's Relativity is to black hole theory.
Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology. The voice is a signal of infinite information. A direct analysis and synthesizing the... more
Digital processing of speech signal and voice recognition algorithm is very important for fast and accurate automatic voice recognition technology. The voice is a signal of infinite information. A direct analysis and synthesizing the complex voice signal is due to too much information contained in the signal. Therefore the digital signal processes such as Feature Extraction and Feature Matching are
Results from preliminary research on recognition of Polish birds’ species are presented in the paper. Bird voices were recorded in a highly noised municipal environment. High 96 kHz sampling frequency has been used. As a feature set... more
Results from preliminary research on recognition of Polish birds’ species are presented in the paper. Bird voices were recorded in a highly noised municipal environment. High 96 kHz sampling frequency has been used. As a feature set standard mel-frequency cepstral coefficients (MFCC) and recently proposed human-factor cepstral coefficients (HFCC) parameters were selected. Superior performance of the HFCC features over MFCC ones has been observed. Proper limiting of the maximal frequency during HFCC feature extraction results in increasing accuracy of birds’ species recognition. Good initial results are very promising for practical application of the methods described in the paper in monitoring of protected birds’ area.
- by Robert Wielgat and +1
- •
- Technology, Bioacoustics, Signal Processing, Human Factors
W artykule opisano wyniki badań, dotyczące automatycznego rozpoznawania mowy zaburzonej. Badania przeprowadzono dla kilku polskich fonemów sprawiających największe problemy dzieciom z wadami wymowy. Zbadano trzy rodzaje współczynników... more
W artykule opisano wyniki badań, dotyczące automatycznego rozpoznawania mowy zaburzonej. Badania przeprowadzono dla kilku polskich fonemów sprawiających największe problemy dzieciom z wadami wymowy. Zbadano trzy rodzaje współczynników cepstralnych: standardowe (CC), mel-cepstralne MFCC oraz współczynniki HFCC jako cechy sygnału mowy. Jako klasyfikatorów użyto klasycznego algorytmu nieliniowej transformacji czasowej (ang. Dynamic Time Warping) oraz średniego wektora cech. Zastosowanie cech HFCC wpłynęło na znaczącą poprawę wyników rozpoznawania. Przebadano szeroki zakres wartości parametrów w procesie obliczania HFCC w celu znalezienia ich optymalnych wartości dla różnych zadań rozpoznawania.
The decreasing cost per processing power in commercial off-the-shelf components and the decreasing size of the processing units enables Automatic Speech Recognition to be a trending topic in Embedded Systems, Internet of Things and Smart... more
The decreasing cost per processing power in commercial off-the-shelf components and the decreasing size of the processing units enables Automatic Speech Recognition to be a trending topic in Embedded Systems, Internet of Things and Smart Home Applications. One of the first and most intuitive algorithms used for recognizing spoken words is Dynamic Time Warping (DTW) and some state-of-the-art speech recognition algorithms still use it during the preprocessing phase in order to increase their accuracy. This paper investigates the performance of a standalone DTW-based system for the task of recognizing isolated words using a small set of known references designed for a simple Smart Home Application. The advantages and disadvantages of this approach are also analyzed, in addition to alternative implementations of the proposed algorithm and other algorithms in general which are reviewed. Finally, the results of using a straightforward DTW approach in a speech recognition system are evaluated.
We propose a human body action classifier based on a 3D representation of the body in terms of volumetric coordinates. Features representing body postures are extracted directly from 3D data, making the system inherently insensitive to... more
We propose a human body action classifier based on a 3D representation of the body in terms of volumetric coordinates. Features representing body postures are extracted directly from 3D data, making the system inherently insensitive to viewpoint dependence, motion ambiguities and selfocclusions. An Invariant Shape Descriptor of human body is obtained in order to capture only posture-dependent characteristics, despite possible differences in translation, orientation, scale and body size. Frame-by-frame descriptions, ...
— The widespread diffusion of hand-held devices with video recording capabilities requires the adoption of reliable Digital Stabilization methods to enjoy the acquired sequences without disturbing jerkiness. In order to effectively get... more
— The widespread diffusion of hand-held devices with video recording capabilities requires the adoption of reliable Digital Stabilization methods to enjoy the acquired sequences without disturbing jerkiness. In order to effectively get rid of the unwanted camera movements, an estimate of the global motion between adjacent frames is necessary. This paper presents a novel approach for estimating the global motion between frames using a Curve Warping technique known as Dynamic Time Warping. The proposed algorithm guarantees robustness also in presence of sharp illumination changes and moving objects 1. Index Terms —Video Stabilization, Dynamic Time Warping, global motion estimation. I.
- by Gabriel Levi and +3
- •
- Engineering, Pattern Recognition, Evaluation, Speech Recognition