Alexis Moinet | University of Mons (original) (raw)
Papers by Alexis Moinet
Lecture Notes in Computer Science, 2013
VideoCycle is a candidate application for this second Video Browser Showdown challenge. VideoCycl... more VideoCycle is a candidate application for this second Video Browser Showdown challenge. VideoCycle allows interactive intra-video and inter-shot navigation with dedicated gestural controllers. MediaCycle, the framework it is built upon, provides media organization by similarity, with a modular architecture enabling most of its workflow to be performed by plugins: feature extraction, clustering, segmentation, summarization, intra-media and inter-segment visualization. MediaCycle focuses on user experience with user interfaces that can be tailored to specific use cases.
The purpose of the LoopJam installation is to gather people to collaboratively build a live music... more The purpose of the LoopJam installation is to gather people to collaboratively build a live musical atmosphere. A two-dimensional "sound map" is created by our software which analyzes sounds, as well as musical loops, and groups them by similarity. Musical loops are chunks of music content (typically a few musical measures long) that create rhythmic music when played repeatedly. Wandering through the installation, each participant explores this sound map and activates sounds by simple gestures such as raising the hand, clapping or stepping. The playback of each sound is synchronized by our software, allowing a collaborative audio composition shared between all participants. Networking technologies are used for the communication between the 3D cameras tracking the participant motions and the music and visual rendering components. Network communication can hence potentially be used to create interactions between distant installations.
The purpose of this project is to provide a method and a software for browsing a dance video data... more The purpose of this project is to provide a method and a software for browsing a dance video database. This work has been done in collaboration with the artistic project "DANCERS!" [6]. A set of features describing dance are proposed, to quantify the gesture of the dancer, the usage of its personal space, the occupation of the stage and the temporal structure of the choreography. These quantities are extracted using the EyesWeb XMI platform [5], to which new features were added. The description of the video is used to compute the similarity between the dancers and then to help the exploration of the database. The management of the video and feature collection is done using the Mediacycle software [7] which has been extended from sounds and images to videos databases.
This paper presents the LoopJam installation which allows participants to interact with a sound m... more This paper presents the LoopJam installation which allows participants to interact with a sound map using a 3D computer vision tracking system. The sound map results from similarity-based clustering of sounds. The playback of these sounds is controlled by the positions or gestures of participants tracked with a Kinect depth-sensing camera. The beat-inclined bodily movements of participants in the installation are mapped to the tempo of played sounds, while the playback speed is synchronized by default among ...
RÉSUMÉ Ce papier présente l'installation LoopJam qui permet aux visiteurs d'interagir a... more RÉSUMÉ Ce papier présente l'installation LoopJam qui permet aux visiteurs d'interagir avec une carte musicale par le biais d'un système de suivi gestuel par vision informatique. La carte sonore résulte d'un partitionnement des sons en groupes par similarité basée sur leur signal. Le rendu sonore est contrôlé par les positions ou gestes des participants captés par une caméra Kinect détectant la profondeur de la scène 3D. Les mouvements des participants exprimant une mesure ou un tempo sont corrélés à la vitesse de lecture ...
Statistical parametric speech synthesis has recently shown its ability to produce natural soundin... more Statistical parametric speech synthesis has recently shown its ability to produce natural sounding speech while keeping a certain flexibility for voice transformation without requiring a huge amount of data. This abstract presents how machine learning techniques such as Hidden Markov Models in generation mode or context oriented clustering with decision trees are applied in speech synthesis. Fields that are investigated in our laboratory to improve this method are also discussed.
This paper reports on the LaughterCycle project, held during a three month period between April a... more This paper reports on the LaughterCycle project, held during a three month period between April and June 2009, within the numediart research programme. In this project, we have been developing technological building blocks for an application allowing to record and retrieve laughs according to their similarity. After an introduction regarding the structure of laughter signals, a range of methodologies are proposed to analyze the laughter timbre, to segment and classify individual laughter syllables, and to characterize the laughter rhythmic structure. The laughter timbre analysis methodologies come from the speech analysis as well as sound description and information retrieval literature, and the segmentation and rhythm characterization have been developed relying on previous studies on laughter signal structure.
Language Resources and Evaluation, 2010
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007
Abstract The subject of this paper is the conversion of a given speaker&a... more Abstract The subject of this paper is the conversion of a given speaker's voice (the source speaker) into another identified voice (the target one). We assume we have at our disposal a large amount of speech samples from source and target voice with at least a part of them being parallel. The proposed system is built on a mapping function between source and target spectral envelopes followed by a frame selection algorithm to produce final spectral envelopes. Converted speech is produced by a basic LP analysis of the source and LP ...
Journal on Multimodal User Interfaces, 2010
Abstract The AVLaughterCycle project aims at developing an audiovisual laughing machine, able to ... more Abstract The AVLaughterCycle project aims at developing an audiovisual laughing machine, able to detect and respond to user's laughs. Laughter is an important cue to reinforce the engagement in human-computer interactions. As a first step toward this goal, we have implemented a system capable of recording the laugh of a user and responding to it with a similar laugh. The output laugh is automatically selected from an audiovisual laughter database by analyzing acoustic similarities with the input laugh. It is displayed by an ...
This report gives the results of the project #10 of eNTERFACE 2008 summer workshop on mulimodal i... more This report gives the results of the project #10 of eNTERFACE 2008 summer workshop on mulimodal interfaces. This research highlights the problems of content-oriented instrumental sound synthesis, and an example for the violin is developed. The project has been planned as an extensive sound analysis processus, where multiple aspects of a given database have been studied. Then different resynthesis strategies have been implemented and some relevant choices in this field are discussed.
The purpose of the LoopJam installation is to gather people to collaboratively build a live music... more The purpose of the LoopJam installation is to gather people to collaboratively build a live musical atmosphere. A two-dimensional "sound map" is created by our software which analyzes sounds, as well as musical loops, and groups them by similarity. Musical loops are chunks of music content (typically a few musical measures long) that create rhythmic music when played repeatedly. Wandering through the installation, each participant explores this sound map and activates sounds by simple gestures such as raising the hand, clapping or stepping. The playback of each sound is synchronized by our software, allowing a collaborative audio composition shared between all participants. Networking technologies are used for the communication between the 3D cameras tracking the participant motions and the music and visual rendering components. Network communication can hence potentially be used to create interactions between distant installations.
Journal on Multimodal User Interfaces, 2008
... ORIGINAL PAPER RAMCESS 2.X frameworkexpressive voice analysis for realtime and accurate synt... more ... ORIGINAL PAPER RAMCESS 2.X frameworkexpressive voice analysis for realtime and accurate synthesis of singing Nicolas d'Alessandro · Onur Babacan · Baris Bozkurt · Thomas Dubuisson ·Andre Holzapfel · Loic Kessous · Alexis Moinet · Maxime Vlieghe ...
2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009
This paper proposes a method to improve the quality delivered by statistical parametric speech sy... more This paper proposes a method to improve the quality delivered by statistical parametric speech synthesizers. For this, we use a codebook of pitch-synchronous residual frames, so as to construct a more realistic source signal. First a limited codebook of typical excitations is built from some training database. During the synthesis part, HMMs are used to generate filter and source coefficients. The latter coefficients contain both the pitch and a compact representation of target residual frames. The source signal is obtained by concatenating excitation frames picked up from the codebook, based on a selection criterion and taking target residual coefficients as input. Subjective results show a relevant improvement compared to the basic technique.
Proceedings of the …
In this paper we present the investigations realized in the context of the eNTERFACE 3rd summer w... more In this paper we present the investigations realized in the context of the eNTERFACE 3rd summer worshop on multimodal interfaces. It concerns the development of a new release of the RAMCESS framework (2. x), preserving the accurate and expressive ...
The purpose of this project is to provide a method and a software for browsing a dance video data... more The purpose of this project is to provide a method and a software for browsing a dance video database. This work has been done in collaboration with the artistic project "DANCERS!" [6]. A set of features describing dance are proposed, to quantify the gesture of the dancer, the usage of its personal space, the occupation of the stage and the temporal structure of the choreography. These quantities are extracted using the EyesWeb XMI platform [5], to which new features were added. The description of the video is used to compute the similarity between the dancers and then to help the exploration of the database. The management of the video and feature collection is done using the Mediacycle software [7] which has been extended from sounds and images to videos databases.
QPSR of the numediart research program. Ed. by Thierry Dutoit and Benoît Macq, Mar 1, 2008
Within the numediart/HyForge research program, this project aims at studying techniques and devel... more Within the numediart/HyForge research program, this project aims at studying techniques and developing software prototypes for interactive skimming audio contents. After surveying the stateof-the-art techniques for time-scale modifications of audio signals, we adopt a phase vocoding method. Some limitations of this method are identified and solutions are proposed. More especially, we derive an algorithm that allows adapting continuously the skimming parameters (ie, the position in the sound excerpt and the ...
Proc. Benelearn, 2008
Statistical parametric speech synthesis has recently shown its ability to produce natural soundin... more Statistical parametric speech synthesis has recently shown its ability to produce natural sounding speech while keeping a certain flexibility for voice transformation without requiring a huge amount of data. This abstract presents how machine learning techniques such as Hidden Markov Models in generation mode or context oriented clustering with decision trees are applied in speech synthesis. Fields that are investigated in our laboratory to improve this method are also discussed.
Lecture Notes in Computer Science, 2013
VideoCycle is a candidate application for this second Video Browser Showdown challenge. VideoCycl... more VideoCycle is a candidate application for this second Video Browser Showdown challenge. VideoCycle allows interactive intra-video and inter-shot navigation with dedicated gestural controllers. MediaCycle, the framework it is built upon, provides media organization by similarity, with a modular architecture enabling most of its workflow to be performed by plugins: feature extraction, clustering, segmentation, summarization, intra-media and inter-segment visualization. MediaCycle focuses on user experience with user interfaces that can be tailored to specific use cases.
The purpose of the LoopJam installation is to gather people to collaboratively build a live music... more The purpose of the LoopJam installation is to gather people to collaboratively build a live musical atmosphere. A two-dimensional "sound map" is created by our software which analyzes sounds, as well as musical loops, and groups them by similarity. Musical loops are chunks of music content (typically a few musical measures long) that create rhythmic music when played repeatedly. Wandering through the installation, each participant explores this sound map and activates sounds by simple gestures such as raising the hand, clapping or stepping. The playback of each sound is synchronized by our software, allowing a collaborative audio composition shared between all participants. Networking technologies are used for the communication between the 3D cameras tracking the participant motions and the music and visual rendering components. Network communication can hence potentially be used to create interactions between distant installations.
The purpose of this project is to provide a method and a software for browsing a dance video data... more The purpose of this project is to provide a method and a software for browsing a dance video database. This work has been done in collaboration with the artistic project "DANCERS!" [6]. A set of features describing dance are proposed, to quantify the gesture of the dancer, the usage of its personal space, the occupation of the stage and the temporal structure of the choreography. These quantities are extracted using the EyesWeb XMI platform [5], to which new features were added. The description of the video is used to compute the similarity between the dancers and then to help the exploration of the database. The management of the video and feature collection is done using the Mediacycle software [7] which has been extended from sounds and images to videos databases.
This paper presents the LoopJam installation which allows participants to interact with a sound m... more This paper presents the LoopJam installation which allows participants to interact with a sound map using a 3D computer vision tracking system. The sound map results from similarity-based clustering of sounds. The playback of these sounds is controlled by the positions or gestures of participants tracked with a Kinect depth-sensing camera. The beat-inclined bodily movements of participants in the installation are mapped to the tempo of played sounds, while the playback speed is synchronized by default among ...
RÉSUMÉ Ce papier présente l'installation LoopJam qui permet aux visiteurs d'interagir a... more RÉSUMÉ Ce papier présente l'installation LoopJam qui permet aux visiteurs d'interagir avec une carte musicale par le biais d'un système de suivi gestuel par vision informatique. La carte sonore résulte d'un partitionnement des sons en groupes par similarité basée sur leur signal. Le rendu sonore est contrôlé par les positions ou gestes des participants captés par une caméra Kinect détectant la profondeur de la scène 3D. Les mouvements des participants exprimant une mesure ou un tempo sont corrélés à la vitesse de lecture ...
Statistical parametric speech synthesis has recently shown its ability to produce natural soundin... more Statistical parametric speech synthesis has recently shown its ability to produce natural sounding speech while keeping a certain flexibility for voice transformation without requiring a huge amount of data. This abstract presents how machine learning techniques such as Hidden Markov Models in generation mode or context oriented clustering with decision trees are applied in speech synthesis. Fields that are investigated in our laboratory to improve this method are also discussed.
This paper reports on the LaughterCycle project, held during a three month period between April a... more This paper reports on the LaughterCycle project, held during a three month period between April and June 2009, within the numediart research programme. In this project, we have been developing technological building blocks for an application allowing to record and retrieve laughs according to their similarity. After an introduction regarding the structure of laughter signals, a range of methodologies are proposed to analyze the laughter timbre, to segment and classify individual laughter syllables, and to characterize the laughter rhythmic structure. The laughter timbre analysis methodologies come from the speech analysis as well as sound description and information retrieval literature, and the segmentation and rhythm characterization have been developed relying on previous studies on laughter signal structure.
Language Resources and Evaluation, 2010
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07, 2007
Abstract The subject of this paper is the conversion of a given speaker&a... more Abstract The subject of this paper is the conversion of a given speaker's voice (the source speaker) into another identified voice (the target one). We assume we have at our disposal a large amount of speech samples from source and target voice with at least a part of them being parallel. The proposed system is built on a mapping function between source and target spectral envelopes followed by a frame selection algorithm to produce final spectral envelopes. Converted speech is produced by a basic LP analysis of the source and LP ...
Journal on Multimodal User Interfaces, 2010
Abstract The AVLaughterCycle project aims at developing an audiovisual laughing machine, able to ... more Abstract The AVLaughterCycle project aims at developing an audiovisual laughing machine, able to detect and respond to user's laughs. Laughter is an important cue to reinforce the engagement in human-computer interactions. As a first step toward this goal, we have implemented a system capable of recording the laugh of a user and responding to it with a similar laugh. The output laugh is automatically selected from an audiovisual laughter database by analyzing acoustic similarities with the input laugh. It is displayed by an ...
This report gives the results of the project #10 of eNTERFACE 2008 summer workshop on mulimodal i... more This report gives the results of the project #10 of eNTERFACE 2008 summer workshop on mulimodal interfaces. This research highlights the problems of content-oriented instrumental sound synthesis, and an example for the violin is developed. The project has been planned as an extensive sound analysis processus, where multiple aspects of a given database have been studied. Then different resynthesis strategies have been implemented and some relevant choices in this field are discussed.
The purpose of the LoopJam installation is to gather people to collaboratively build a live music... more The purpose of the LoopJam installation is to gather people to collaboratively build a live musical atmosphere. A two-dimensional "sound map" is created by our software which analyzes sounds, as well as musical loops, and groups them by similarity. Musical loops are chunks of music content (typically a few musical measures long) that create rhythmic music when played repeatedly. Wandering through the installation, each participant explores this sound map and activates sounds by simple gestures such as raising the hand, clapping or stepping. The playback of each sound is synchronized by our software, allowing a collaborative audio composition shared between all participants. Networking technologies are used for the communication between the 3D cameras tracking the participant motions and the music and visual rendering components. Network communication can hence potentially be used to create interactions between distant installations.
Journal on Multimodal User Interfaces, 2008
... ORIGINAL PAPER RAMCESS 2.X frameworkexpressive voice analysis for realtime and accurate synt... more ... ORIGINAL PAPER RAMCESS 2.X frameworkexpressive voice analysis for realtime and accurate synthesis of singing Nicolas d'Alessandro · Onur Babacan · Baris Bozkurt · Thomas Dubuisson ·Andre Holzapfel · Loic Kessous · Alexis Moinet · Maxime Vlieghe ...
2009 IEEE International Conference on Acoustics, Speech and Signal Processing, 2009
This paper proposes a method to improve the quality delivered by statistical parametric speech sy... more This paper proposes a method to improve the quality delivered by statistical parametric speech synthesizers. For this, we use a codebook of pitch-synchronous residual frames, so as to construct a more realistic source signal. First a limited codebook of typical excitations is built from some training database. During the synthesis part, HMMs are used to generate filter and source coefficients. The latter coefficients contain both the pitch and a compact representation of target residual frames. The source signal is obtained by concatenating excitation frames picked up from the codebook, based on a selection criterion and taking target residual coefficients as input. Subjective results show a relevant improvement compared to the basic technique.
Proceedings of the …
In this paper we present the investigations realized in the context of the eNTERFACE 3rd summer w... more In this paper we present the investigations realized in the context of the eNTERFACE 3rd summer worshop on multimodal interfaces. It concerns the development of a new release of the RAMCESS framework (2. x), preserving the accurate and expressive ...
The purpose of this project is to provide a method and a software for browsing a dance video data... more The purpose of this project is to provide a method and a software for browsing a dance video database. This work has been done in collaboration with the artistic project "DANCERS!" [6]. A set of features describing dance are proposed, to quantify the gesture of the dancer, the usage of its personal space, the occupation of the stage and the temporal structure of the choreography. These quantities are extracted using the EyesWeb XMI platform [5], to which new features were added. The description of the video is used to compute the similarity between the dancers and then to help the exploration of the database. The management of the video and feature collection is done using the Mediacycle software [7] which has been extended from sounds and images to videos databases.
QPSR of the numediart research program. Ed. by Thierry Dutoit and Benoît Macq, Mar 1, 2008
Within the numediart/HyForge research program, this project aims at studying techniques and devel... more Within the numediart/HyForge research program, this project aims at studying techniques and developing software prototypes for interactive skimming audio contents. After surveying the stateof-the-art techniques for time-scale modifications of audio signals, we adopt a phase vocoding method. Some limitations of this method are identified and solutions are proposed. More especially, we derive an algorithm that allows adapting continuously the skimming parameters (ie, the position in the sound excerpt and the ...
Proc. Benelearn, 2008
Statistical parametric speech synthesis has recently shown its ability to produce natural soundin... more Statistical parametric speech synthesis has recently shown its ability to produce natural sounding speech while keeping a certain flexibility for voice transformation without requiring a huge amount of data. This abstract presents how machine learning techniques such as Hidden Markov Models in generation mode or context oriented clustering with decision trees are applied in speech synthesis. Fields that are investigated in our laboratory to improve this method are also discussed.