guillaume gibert | ECAM - Academia.edu (original) (raw)

Papers by guillaume gibert

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed-on demand and at the reception-on the original broadcast as an alternative to subtitling. The paper presents the multimodal text-to-speech synthesis system and the first evaluation performed by deaf users.

Adolescence, 2015

The teleoperating platform SWoOZ sets up a humanoid robot as a mediator between two humans. The f... more The teleoperating platform SWoOZ sets up a humanoid robot as a mediator between two humans. The first studies emphasize the subjective and affective dimensions of the experience of people interacting with the robot, especially in terms of presence, point out the untapped potential for the clinical use of the robot. It appears as a technological tool, toy, playmate and partner in the therapeutic process.

ExLing Conferences, Nov 1, 2019

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then incrusted-on demand and at the reception-in the original broadcast as an alternative to subtitling. The paper presents the multimodal textto-speech synthesis system and the first evaluation performed by deaf users.

Scientific Reports, 2019

Recent studies have shown how embodiment induced by multisensory bodily interactions between indi... more Recent studies have shown how embodiment induced by multisensory bodily interactions between individuals can positively change social attitudes (closeness, empathy, racial biases). Here we use a simple neuroscience-inspired procedure to beam our human subjects into one of two distinct robots and demonstrate how this can readily increase acceptability and social closeness to that robot. Participants wore a Head Mounted Display tracking their head movements and displaying the 3D visual scene taken from the eyes of a robot which was positioned in front of a mirror and piloted by the subjects’ head movements. As a result, participants saw themselves as a robot. When participant’ and robot’s head movements were correlated, participants felt that they were incorporated into the robot with a sense of agency. Critically, the robot they embodied was judged more likeable and socially closer. Remarkably, we found that the beaming experience with correlated head movements and corresponding sens...

This paper describes the OpenViBE software platform which enables to design, test and use Brain-C... more This paper describes the OpenViBE software platform which enables to design, test and use Brain-Computer Interfaces. Brain-Computer Interfaces (BCI) are communication systems that enable users to send commands to computers only by means of brain activity. BCI are gaining interest among the Virtual Reality (VR) community since they have appeared as promising interaction devices for Virtual Environments (VE). The key features of the platform are 1) a high modularity, 2) embedded tools for visualization and feedback based on VR and 3D displays, 3) BCI design made available to non-programmers thanks to visual programming and 4) various tools offered to the different types of users. The platform features are illustrated in this paper with two entertaining VR applications based on a BCI. In the first one, users can move a virtual ball by imagining hand movements, while in the second one, they can control a virtual spaceship using real or imagined foot movements. Online experiments with th...

In this paper we describe a way to enhance human computer interaction using facial Electromyograp... more In this paper we describe a way to enhance human computer interaction using facial Electromyographic (EMG) sensors. Indeed, to know the emotional state of the user enables adaptable interaction specific to the mood of the user. This way, Human Computer Interaction (HCI) will gain in ergonomics and ecological validity. While expressions recognition systems based on video need exaggerated facial expressions to reach high recognition rates, the technique we developed using electrophysiological data enables faster detection of facial expressions and even in the presence of subtle movements. Features from 8 EMG sensors located around the face were extracted. Gaussian models for six basic facial expressions- anger, surprise, disgust, happiness, sadness and neutral- were learnt from these features and provide a mean recognition rate of 92%. Finally, a prototype of one possible application of this system was developed wherein the output of the recognizer was sent to the expressions module o...

BMC Sports Science, Medicine and Rehabilitation, 2021

Background Self-myofascial release is an emerging technique in strength and conditioning. Yet, th... more Background Self-myofascial release is an emerging technique in strength and conditioning. Yet, there is no consensus regarding optimal practice guidelines. Here, we investigated the acute effects of various foam rolling interventions targeting quadriceps muscles, with or without sliding pressures. Methods We conducted a blinded randomized control pilot trial in 42 healthy weightlifting athletes over 4 weeks. Participants were randomly allocated to one of the four intervention (120 s massage routine) groups: foam rolling, roller massager, foam rolling with axial sliding pressures, foam rolling with transverse sliding pressures. Knee range of motion, skin temperature and subjective scores of the perceived heat, range of motion, muscle pain and relaxation were the dependent variables. Measurements were carried on before, after and up to 15 min (follow-up) after the massage intervention. Results The range of motion increased immediately after the various foam rolling interventions (+ 10...

an embodied conversational agent into an efficient talking head: from keyframe-based animation to... more an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed- on demand and at the reception- in the original broadcast as an alternative to subtitling. The paper presents the multimodal text-to-speech synthesis system and the first evaluation performed by deaf users.

Although typically studied as an auditory phenomenon, prosody can also be conveyed by the visual ... more Although typically studied as an auditory phenomenon, prosody can also be conveyed by the visual speech signal, through increased movements of articulators during speech production, or through eyebrow and rigid head movements. This paper aimed to quantify such visual correlates of prosody. Specifically, the study was concerned with measuring the visual correlates of prosodic focus and prosodic phrasing. In the experiment, four participants ’ speech and face movements were recorded while they completed a dialog exchange task with an interlocutor. Acoustic analysis showed that prosodic contrasts differed on duration, pitch and intensity parameters, which is consistent with previous findings in the literature. The visual data was processed using guided principal component analysis. The results showed that compared to the broad focused statement condition, speakers produced greater movement on both articulatory and non-articulatory parameters for prosodically focused and intonated words...

Abstract. The ARTUS project provides deaf televiewers with an alternative substitute for subtitle... more Abstract. The ARTUS project provides deaf televiewers with an alternative substitute for subtitles using Cued Speech: an animated agent can be superimposed- on demand and at the reception- to the original broadcast. The hand and face gestures of the agent are generated automatically by a text-to-cued speech synthesizer and watermarked in the broadcasted audiovisual signals. We describe here the technological blocks of our demonstrator. First evaluation of the complete system by end-users is presented. I.

We have implemented a complete text-to-speech synthesis system by concatenation that addresses Fr... more We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys (“dikeys”). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-density a...

We present here our efforts for characterizing the 3D movements of the right hand and the face of... more We present here our efforts for characterizing the 3D movements of the right hand and the face of a French female during the production of manual cued speech. We analyzed the 3D trajectories of 50 hand and 63 facial eshpoints during the production of 238 utterances carefully designed for covering all possible diphones of the French language. Linear and non linear statistical models of the hand and face deformations and postures have been developed using both separate and joint corpora. We implement a concatenative audiovisual text-to-cued speech synthesis system. 1.

Humanoid robots are more and more realistic but the se systems still fail to be as friendly and n... more Humanoid robots are more and more realistic but the se systems still fail to be as friendly and natural as humans in interaction. Behavioural models of interaction controlling these robots cannot capture and replica te the extreme complexity of human communication. To determine the real limitations and key factors one must impose on a behavioural model to maintain a humanrobot interaction as natural and efficient as a hum anhuman interaction, we propose to build and use a su per Wizard of Oz setup. This platform consists of a Fac eLab sensor able to track a confederate’s rigid head mot ion and gaze in real time and of a iCub robot able to repli cate the confederate’s movements. By manipulating certain ke y parameters, we will be able to determine the necess ary limits to impose on a behavioural model.

The P300 Speller has been proposed in 1988 by Farwell and Donchin [1]. In this Brain Computer Int... more The P300 Speller has been proposed in 1988 by Farwell and Donchin [1]. In this Brain Computer Interface (BCI), a matrix of symbols is presented whose rows and columns are sequentially intensified. In this study, we investigated the influence of three stimulation parameters: the enhancement of the symbols while intensified, the inter-stimulus interval (ISI) and the reduction of flash duration. Results indicate that symbol enhancement increase P300 amplitude and the ensuing classification accuracy by a Fisher LDA. P300 amplitude and classification accuracy decrease with faster ISI. Finally, reduction of flash duration do not increase the P300 amplitude but yielded a better classification accuracy.

Brain-computer interfaces (BCI) are communication system that use brain activities to control a d... more Brain-computer interfaces (BCI) are communication system that use brain activities to control a device. The BCI studied is based on the P300 speller [1]. A new algorithm to select relevant sensors is proposed: it is based on a previous proposed algorithm [2] used to enhance P300 potentials by spatial filters. Data recorded on three subjects were used to evaluate the proposed selection method: it is shown to be efficient and to compare favourably with a reference method [3].

This paper presents a study of audio-visual production of the four Mandarin lexical tones on word... more This paper presents a study of audio-visual production of the four Mandarin lexical tones on words in citation form and in sentences. OPTOTRAK motion capture data of the head and face of a Mandarin speaker were modelled using both PCA and guided-PCA. For each tone, correlations between F0 values and the different face and head components were calculated. Results show that there are visual parameters related to the different F0 patterns of each tone. Moreover differences were found in both duration and correlational patterns between words produced in citation and in sentential forms. The results show that there are identifiable visual correlates of lexical tone but the difference between citation and sentential forms has implications for materials used in production and perception studies of Mandarin lexical tones, and possibly those in other languages.

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed - on demand and at the reception - on the original broadcast as an alternative to subtitling. The paper presents the multimodal text-to-speech synthesis system and the first evaluation performed by deaf users. Index Terms: Cued speech, evaluation, audiovisual speech synthesis

We have implemented a complete text-to-speech synthesis system by concatenation that addresses Fr... more We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys (“dikeys”). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by the accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-densi...

2004 12th European Signal Processing Conference, 2004

We present here our efforts for characterizing the 3D movements of the right hand and the face of... more We present here our efforts for characterizing the 3D movements of the right hand and the face of a French female during the production of manual cued speech. We analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed for covering all possible diphones of the French language. Linear and non linear statistical models of the hand and face deformations and postures have been developed using both separate and joint corpora. We implement a concatenative audiovisual text-to-cued speech synthesis system.

Adolescence, 2015

ExLing Conferences, Nov 1, 2019

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then incrusted-on demand and at the reception-in the original broadcast as an alternative to subtitling. The paper presents the multimodal textto-speech synthesis system and the first evaluation performed by deaf users.

Scientific Reports, 2019

BMC Sports Science, Medicine and Rehabilitation, 2021

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed- on demand and at the reception- in the original broadcast as an alternative to subtitling. The paper presents the multimodal text-to-speech synthesis system and the first evaluation performed by deaf users.

This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at w... more This paper presents the virtual speech cuer built in the context of the ARTUS project aiming at watermarking hand and face gestures of a virtual animated agent in a broadcasted audiovisual sequence. For deaf televiewers that master cued speech, the animated agent can be then superimposed - on demand and at the reception - on the original broadcast as an alternative to subtitling. The paper presents the multimodal text-to-speech synthesis system and the first evaluation performed by deaf users. Index Terms: Cued speech, evaluation, audiovisual speech synthesis

We have implemented a complete text-to-speech synthesis system by concatenation that addresses Fr... more We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys (“dikeys”). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by the accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-densi...

2004 12th European Signal Processing Conference, 2004

We present here our efforts for characterizing the 3D movements of the right hand and the face of... more We present here our efforts for characterizing the 3D movements of the right hand and the face of a French female during the production of manual cued speech. We analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed for covering all possible diphones of the French language. Linear and non linear statistical models of the hand and face deformations and postures have been developed using both separate and joint corpora. We implement a concatenative audiovisual text-to-cued speech synthesis system.