Gérard Chollet | Institut Mines-Telecom (original) (raw)

Papers by Gérard Chollet

First European Conference on Speech Communication and Technology (Eurospeech 1989)

Classical speech synthesis systems either concatenate diphone-like tabulated patterns or reconstr... more Classical speech synthesis systems either concatenate diphone-like tabulated patterns or reconstruct speech parameters according to pre-defined rales. Both techniques show drawbacks: the former lacks flexibility while the latter is highly time-consuming to built.

1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175)

... IN AN IDENTITY VERIFICATION SYSTEM USING IT-NN BASED CLASSIFIERS Patrick Verlinde Ggrax-d Cho... more ... IN AN IDENTITY VERIFICATION SYSTEM USING IT-NN BASED CLASSIFIERS Patrick Verlinde Ggrax-d Chollet Signal and Image Center CNRS URR-820 Royal Military Academy ENST/TSI Departmcnt Brussels, Belgium Paris, France ... Springer Verlag, 1997. [5] C. W. Therrien. ...

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Speech coding by indexation has proven to lower the rate of speech compression drastically. Based... more Speech coding by indexation has proven to lower the rate of speech compression drastically. Based on the Automatic Language Independent Speech Processing (A.L.I.S.P) approach that automatically segments the speech signal ([1]), we studied the possibility of optimising this rate as well as the quality of re-synthetised signal, by using the text information corresponding to the speech signal, and by implementing a new segmentation method. This led to the speech alignment with its phonetic transcription and the use of polyphones, to finally increase output speech quality while keeping a bitrate between 400bits/s and 600bits/s. Typically, this can be used to store recorded alpha-numeric books for blind people, or compressing recorded courses for e-learning. Cell phone applications could also be considered.

2nd International Conference on Spoken Language Processing (ICSLP 1992)

Speech analysis for high quality speech synthesis or high accuracy speech recognition requires re... more Speech analysis for high quality speech synthesis or high accuracy speech recognition requires realistic models not only for the vocal tract but also for the voice source. This paper presents a comparison between two analysis methods for the calculation of the voice ...

5th International Conference on Spoken Language Processing (ICSLP 1998)

Page 1. ∑ −= − = L L k tt t kfj i a k i a i a etA th ) )( ( 2 0 )( )( i a t SPEECH PRE-PROCESSI... more Page 1. ∑ −= − = L L k tt t kfj i a k i a i a etA th ) )( ( 2 0 )( )( i a t SPEECH PRE-PROCESSING AGAINST INTENTIONAL IMPOSTURE IN SPEAKER RECOGNITION Dominique Genoud+, GérardChollet* +IDIAP, CP 592 CH-1920 Martigny Switzerland, genoud@idiap.ch, ...

4th European Conference on Speech Communication and Technology (Eurospeech 1995)

Keywords: speech Reference EPFL-CONF-82322 Record created on 2006-03-10, modified on 2017-05-10

3rd European Conference on Speech Communication and Technology (Eurospeech 1993)

This paper concerns the problem of speech variability in Automatic Speaker Verification (ASV) sys... more This paper concerns the problem of speech variability in Automatic Speaker Verification (ASV) systems. The performance of our ASV system was compared with the performance of human listeners on material spoken in four different emotional modes (neutral, happiness, ...

Conference of the International Speech Communication Association, 1997

Acoustic speech signal modeling systems are generally formed of two stages. In the first one, an ... more Acoustic speech signal modeling systems are generally formed of two stages. In the first one, an analysis module extracts from the speech signal a sequence of feature vectors that describes the speech in a time-frequency space. ''Mel Frequency based Cepstral Coefficients'' (MFCC) are a popular feature set. In the second stage, stochastic modeling of the feature sequences is performed, generally using ''Hidden Markov Models'' (HMM) [8]. In order to compute the MFCC coefficients a spectral analysis with a filterbank defined on a MEL scale is first performed, then the logarithm operator is applied on the filterbank energies followed by a cosine transform. MEL frequency scale, a psycho-acoustic scale, is characterized with a higher resolution in the low frequency bands with respect to the high frequency bands. Besides the psycho-acoustic characteristics, increasing the frequency resolution in the low

Publication in the conference proceedings of EUSIPCO, Toulouse, France, 2002

Abstract. This paper summarizes the rationale for proposing the COST-277 “nonlinear speech proces... more Abstract. This paper summarizes the rationale for proposing the COST-277 “nonlinear speech processing ” action, and the work done during these last four years. In addition, future perspectives are described.

Thomas Hueber, Gerard Chollet, Bruce Denby, Gerard Dreyfus and Maureen Stone ESPCI Telecom Paris,... more Thomas Hueber, Gerard Chollet, Bruce Denby, Gerard Dreyfus and Maureen Stone ESPCI Telecom Paris, 10 rue Vauquelin, 75005 Paris, France Telecom Paris Tech, 46 rue Barrault, 75013 Paris, France Université Paris VI, ESPCI Laboratoire d’Electronique, 10 rue Vauquelin, 75005 Paris, France Vocal Tract Visualization Lab, Depts of Biomedical Sciences and Orthodontics, University of Maryland Dental School, 650 W. Baltimore St., Baltimore, MD 21201, USA

Voice disguise and reversibility Patrick Perrot , Joseph Razik (2) , Gérard Chollet (2) 1 IRCGN, ... more Voice disguise and reversibility Patrick Perrot , Joseph Razik (2) , Gérard Chollet (2) 1 IRCGN, Institut de recherche criminelle de la gendarmerie nationale, 1 boulevard Théophile Sueur, 93110 Rosny sous Bois CNRS-Institut TELECOM -Telecom ParisTech 46, rue Barrault, Paris 75013 patrick.perrot@gendarmerie.defense.gouv.fr, razik@telecom-paristech.fr, chollet@telecom-paristech.fr Most of studies on disguised voices in forensic science literature are based on their impact on speaker recognition. The performance decreasing is important and it is not possible to ignore the risk of a confusion between two speakers. The principle of our work is to establish a possibility of transformation from a normal voice (suspect voice) towards a disguised voice (query voice). Voice conversion is a good way to define a solution. So, we propose a description in three steps to fight against the problem of disguise and to analyse this possibility of reversibility. We focus on different kinds of disguised...

During eNTERFACE we developed a dialog system design and conversation material for Roberta, an an... more During eNTERFACE we developed a dialog system design and conversation material for Roberta, an anthropomorphic assistant robot. The focus was on the first stage of what we call LifeLine dialogs, i.e. the conversational creation of users’ life stories. Our goal is to help senior citizens record semiautobiographical narratives while combating the deterioration of memory and speech abilities. We successfully completed modelling dialog scenarios for first time users. This allows Roberta to personalize future conversations based on each user’s place of origin, work and education history, and hobbies, which are all information gathered during a user’s first conversation with Roberta. We accomplished this through (1) an adaptable dialog system with topic management and multi-modal functionalities (specifically face recognition), by extending a RavenClaw-type dialog management framework, (2) using the Wizard of Oz (WOZ) data collection technique for categorizing introductory conversation ma...

This study focuses on the question of voice disguise and the problem of its detection. The voice ... more This study focuses on the question of voice disguise and the problem of its detection. The voice disguise is considered as a deliberated action of the speaker who wants to falsify or to conceal his identity. Lots of possibilities are offered to a speaker to change his voice and to false a human ear or an automatic system. He could transform his voice by electronic scrambling or more simply by exploiting the intra-speaker variability: modification of his own pitch, modification of the position of the articulators like lips or tongue which affect the formant frequencies. The proposed work is divided in three parts: the first one is a classification of the different possibilities available to change his voice, the second one presents a review of the different techniques used in the literature and the third one described the main clues proposed in the literature to distinguish a disguised voice from an original voice, before to propose some directions of research based on disordered and...

ICASSP '81. IEEE International Conference on Acoustics, Speech, and Signal Processing

At some stage of the recognition process, a choice has to be made between lexical items. This cho... more At some stage of the recognition process, a choice has to be made between lexical items. This choice is the most difficult if the items form a minimal series (they differ only by one phoneme). A selection of such minimal series has been used to test a number of commercially available word recognition systems (CNET 'Dynamo', INTERSTATE 'VRM', LIMSI-VECSYS 'Primo', THRESHOLD 'T 600') and several speech recognizers developped in France. Recognition scores beeing highly dependant on the quality of the acoustic samples used for training and testing, performance of a system is expressed as the noise level necessary to obtain identical human recognition scores on the same test material. Confusion matrices can be used by the experimenter to correct defficiencies of the algorithms and techniques used, or by the user to select an application vocabulary.

Odyssey 2016, 2016

There are many factors affecting the variability of an i-vector extracted from a speech segment s... more There are many factors affecting the variability of an i-vector extracted from a speech segment such as the acoustic content, segment duration, handset type and background noise. The language being spoken is one of the sources of variation which has received limited focus due to the lack of multilingual resources available. Consequently, the discrimination performance is much lower under multilingual trial condition. Standard session-compensation techniques such as Within-Class Covariance Normalization (WCCN), Linear Discriminant Analysis (LDA) and Probabilistic LDA (PLDA) cannot robustly compensate for language source of variation as the amount of data is limited to represent such variability. Source normalization technique which was developed to compensate for speech-source-variation, offered superior performance in cross-language trials by providing better estimation of within-speaker scatter matrix in WCCN and LDA techniques. However, neither language normalization nor the state-of-the-art PLDA algorithm is capable of modeling language variability on a dataset with insufficient multilingual utterances for each speaker, resulting in a poor performance in cross-language trial condition. This study is an extension to our initial developments of a language-independent PLDA training algorithm which aimed at reducing the effect of language as a source of variability on the performance of speaker recognition. We will provide a thorough analysis of how the proposed approach can utilize multilingual training data from bilingual speakers to robustly compensate for the effect of languages. Evaluated on multilingual trial condition, the proposed solution demonstrated over 10% EER and 13% minimum DCF relative improvement on NIST 2008 speaker recognition evaluation as well as 12.4% EER and 23% minimum DCF on PRISM evaluation set over the baseline system while also providing improvement in other trial conditions.

2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 2016

With an aging population and the financial difficulties of having a full time caregiver for every... more With an aging population and the financial difficulties of having a full time caregiver for every dependent person living at home, assistant robots appear to be a solution for advanced countries. However, most of what can be done with a robot can be done without it. So it is difficult to quantify what real value an assistant robot can add. Such a robot should be a real assistant capable of helping a person, whether indoors or outdoors. Additionally, the robot should be a companion for dialoging, as well as a system capable of detecting health problems. The Roberta Ironside project is a robotic evolution, embodying the expertise learned during the development of pure vocal personal assistants for dependent persons during the vAssist project (Sansen et al. 2014). The project proposes a relatively affordable and simplified design of a human-sized humanoid robot that fits the requirements of this analysis. After an overall description of the robot, the justification of the novel choice of a handicapped robot in an electric wheel-chair, this paper emphasizes the technology that is used for the head and the face and the subsequent verbal and non-verbal communication capabilities of the robot, in turn highlighting the characteristics of Embodied Conversational Agents.

International Conference on Pattern Recognition Applications and Methods, Mar 6, 2014

Principal component analysis is used to implement a semi-automatic recognition system to identify... more Principal component analysis is used to implement a semi-automatic recognition system to identify recaptured northern leopard frogs (Lithobates pipiens). Results of both open set and closed set experiments are given. The presented algorithm is shown to provide accurate identification of 209 individual leopard frogs from a total set of 1386 images.

Workpackage contributing to the Deliverable: WP-A2.2 – Development and test of open