Todor Ganchev - Academia.edu (original) (raw)

Papers by Todor Ganchev

The present paper describes the construction of a multimodal database, referred to as the PROMETH... more The present paper describes the construction of a multimodal database, referred to as the PROMETHEUS database, which contains recordings from heterogeneous sensors. The main purpose of this database is the development of a framework for monitoring and interpretation of human behavior in unrestricted environments of both indoor and outdoor type. It contains single-person and multi-person scenarios, but also covers scenarios with interactions between groups of people. It is devoted to detection of typical and atypical events, while care has been to taken for the recordings to be as close to real-world conditions as possible. The uniqueness of the PROMETHEUS database comes not only from the unique sensor sets but is due primarily to its generic design, which allows for embracing a wide range of real-world applications (including smart-home and humanrobot interaction interfaces, indoors/outdoors public areas surveillance etc).

Frontiers in Neuroengineering, 2009

Reviews and few non-controlled studies showed the effectiveness of several specific designed comp... more Reviews and few non-controlled studies showed the effectiveness of several specific designed computer video-games as an additional form of treatment in several areas. However, there is a lack in the literature of specially designed serious-games for treating mental disorders. Playmancer (ICT European initiative) aims to develop and assess a serious videogame that may help to treat underlying processes (e.g. lack of self-control strategies) in Eating and Impulse control disorders. Preliminary data will be shown.

Alternative ways to represent speaker’s voice individuality are studied for the task of speaker v... more Alternative ways to represent speaker’s voice individuality are studied for the task of speaker verification. We exploit a set of orthonormal bases provided by wavelet packets that allow an effective manipulation of the frequency subbands according to the critical bands concept. Novel wavelet packet based sets of speech features are contrasted with existing wavelet features as well as with the widely accepted Mel-scale cepstral coefficients (MFCC). Our scheme differs from previous wavelet-based works, primarily in the wavelet-packet tree design that follows the concept of critical bandwidth, as well as in the particular wavelet basis function that have been used. Comparative experimental results confirm the assertion that the proposed speech features outperform MFCC, as well as previously used wavelet features, on the task of speaker verification.

An extension of the well-known Probabilistic Neural Network (PNN) to Generalized Locally Recurren... more An extension of the well-known Probabilistic Neural Network (PNN) to Generalized Locally Recurrent PNN (GLR PNN) is introduced. The GLR PNN is derived from the original PNN by incorporating a fully connected recurrent layer between the pattern and output layers. This extension renders GLR PNN sensitive to the context in which events occur, and therefore, capable of identifying temporal and spatial correlations. In the present work, this capability is exploited to improve the speaker verification performance. A fast three-step method for training GLR PNNs is proposed. The first two steps are identical to the training of original PNNs, while the third step is based on the Differential Evolution optimization method.

... Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis ... j=3, calculated with Battle-Lemarié wa... more ... Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis ... j=3, calculated with Battle-Lemarié wavelet of order m=5. In fact, the maximum depth of the wavelet packet tree used in our design is j=7, in which case the large number of wavelet packet functions (66) prohibits their proper ...

Frontiers in Neuroengineering, 2009

This paper introduces Locally Recurrent Probabilistic Neural Networks (LRPNN) as an extension of ... more This paper introduces Locally Recurrent Probabilistic Neural Networks (LRPNN) as an extension of the well-known Probabilistic Neural Networks (PNN). A LRPNN, in contrast to a PNN, is sensitive to the context in which events occur, and therefore, identification of time or spatial correlations is attainable. Besides the definition of the LRPNN architecture a fast three-step training method is proposed. The first two steps are identical to the training of traditional PNNs, while the third step is based on the Differential Evolution optimization method. Finally, the superiority of LRPNNs over PNNs on the task of text-independent speaker verification is demonstrated.

... are widely used in the various implementations of the MFCC. The formulae (3) and (4), when co... more ... are widely used in the various implementations of the MFCC. The formulae (3) and (4), when compared to (2), provide a closer approximation of the Mel scale for frequencies below 1000 Hz, at the price of higher inaccuracy for frequencies higher than 1000 Hz. Page 2. 2.1. ...

... Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis ... In order to determine the maximum freq... more ... Mihalis Siafarikas, Todor Ganchev, Nikos Fakotakis ... In order to determine the maximum frequency resolution nec-essary to capture the speaker identity, we took into considera-tion the concept of critical bandwidth introduced by Fletcher in [5]. Zwicker in [6] estimated that the ...

Frontiers in Neuroengineering, 2009

In the present work, we propose a hybrid architecture for automatic alignment of speech waveforms... more In the present work, we propose a hybrid architecture for automatic alignment of speech waveforms and their corresponding phone sequence. The proposed architecture does not exploit any phone boundary information. Our approach combines the efficiency of embedded training techniques and the high performance of isolated-unit training. Evaluating on the established for the task of phone segmentation TIMIT database, we achieved an accuracy of 83.56%, which corresponds to improving the baseline system's accuracy by 6.09 %.

Frontiers in Neuroengineering, 2009