Juan M Montero - Academia.edu (original) (raw)

Uploads

Papers by Juan M Montero

Research paper thumbnail of Cross-Cultural Perception of Spanish Synthetic Expressive Voices Among Asians

Applied sciences, Mar 12, 2018

Research paper thumbnail of Project CAVIAR CApturing VIewers’ Affective Response

Procesamiento Del Lenguaje Natural, Sep 1, 2019

Research paper thumbnail of Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension

Applied Sciences, 2022

Intent recognition is a key component of any task-oriented conversational system. The intent reco... more Intent recognition is a key component of any task-oriented conversational system. The intent recognizer can be used first to classify the user’s utterance into one of several predefined classes (intents) that help to understand the user’s current goal. Then, the most adequate response can be provided accordingly. Intent recognizers also often appear as a form of joint models for performing the natural language understanding and dialog management tasks together as a single process, thus simplifying the set of problems that a conversational system must solve. This happens to be especially true for frequently asked question (FAQ) conversational systems. In this work, we first present an exploratory analysis in which different deep learning (DL) models for intent detection and classification were evaluated. In particular, we experimentally compare and analyze conventional recurrent neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that best perform...

Research paper thumbnail of A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

Applied Sciences, 2021

Emotion recognition is attracting the attention of the research community due to its multiple app... more Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow di...

Research paper thumbnail of Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

This paper introduces a continuous system capable of automatically producing the most adequate sp... more This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9...

Research paper thumbnail of Towards an unsupervised speaking style voice building framework: multi.style speaker diarization

Research paper thumbnail of A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Research paper thumbnail of Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models

Research paper thumbnail of A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech

Research paper thumbnail of Influence of transition cost in the segmentation stage of speaker diarization

Research paper thumbnail of Development of a Genre-Dependent TTS System with Cross-Speaker Speaking-Style Transplantation

Research paper thumbnail of Evaluation of a transplantation algorithm for expressive speech synthesis

Research paper thumbnail of I Feel You: Towards Affect-Sensitive Domotic Spoken Conversational Agents

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Towards Cross-Lingual Emotion Transplantation

Advances in Speech and Language Technologies for Iberian Languages, 2014

Research paper thumbnail of Urbano, an Interactive Mobile Tour-Guide Robot

Advances in Service Robotics, 2008

Research paper thumbnail of I Feel You: The Design and Evaluation of a Domotic Affect-Sensitive Spoken Conversational Agent

Research paper thumbnail of Design, development and field evaluation of a Spanish into sign language translation system

Pattern Analysis and Applications, 2011

Research paper thumbnail of Automatic Understanding of ATC Speech: Study of Prospectives and Field Experiments for Several Controller Positions

IEEE Transactions on Aerospace and Electronic Systems, 2011

Research paper thumbnail of Automatic Understanding of ATC Speech

IEEE Aerospace and Electronic Systems Magazine, 2006

Research paper thumbnail of Knowledge-Combining Methodology for Dialogue Design in Spoken Language Systems

Genetic Resources and Crop Evolution, 2005

Research paper thumbnail of Cross-Cultural Perception of Spanish Synthetic Expressive Voices Among Asians

Applied sciences, Mar 12, 2018

Research paper thumbnail of Project CAVIAR CApturing VIewers’ Affective Response

Procesamiento Del Lenguaje Natural, Sep 1, 2019

Research paper thumbnail of Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension

Applied Sciences, 2022

Intent recognition is a key component of any task-oriented conversational system. The intent reco... more Intent recognition is a key component of any task-oriented conversational system. The intent recognizer can be used first to classify the user’s utterance into one of several predefined classes (intents) that help to understand the user’s current goal. Then, the most adequate response can be provided accordingly. Intent recognizers also often appear as a form of joint models for performing the natural language understanding and dialog management tasks together as a single process, thus simplifying the set of problems that a conversational system must solve. This happens to be especially true for frequently asked question (FAQ) conversational systems. In this work, we first present an exploratory analysis in which different deep learning (DL) models for intent detection and classification were evaluated. In particular, we experimentally compare and analyze conventional recurrent neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that best perform...

Research paper thumbnail of A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

Applied Sciences, 2021

Emotion recognition is attracting the attention of the research community due to its multiple app... more Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow di...

Research paper thumbnail of Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

This paper introduces a continuous system capable of automatically producing the most adequate sp... more This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9...

Research paper thumbnail of Towards an unsupervised speaking style voice building framework: multi.style speaker diarization

Research paper thumbnail of A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis

Research paper thumbnail of Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models

Research paper thumbnail of A Saliency-Based Attention LSTM Model for Cognitive Load Classification from Speech

Research paper thumbnail of Influence of transition cost in the segmentation stage of speaker diarization

Research paper thumbnail of Development of a Genre-Dependent TTS System with Cross-Speaker Speaking-Style Transplantation

Research paper thumbnail of Evaluation of a transplantation algorithm for expressive speech synthesis

Research paper thumbnail of I Feel You: Towards Affect-Sensitive Domotic Spoken Conversational Agents

Lecture Notes in Computer Science, 2012

Research paper thumbnail of Towards Cross-Lingual Emotion Transplantation

Advances in Speech and Language Technologies for Iberian Languages, 2014

Research paper thumbnail of Urbano, an Interactive Mobile Tour-Guide Robot

Advances in Service Robotics, 2008

Research paper thumbnail of I Feel You: The Design and Evaluation of a Domotic Affect-Sensitive Spoken Conversational Agent

Research paper thumbnail of Design, development and field evaluation of a Spanish into sign language translation system

Pattern Analysis and Applications, 2011

Research paper thumbnail of Automatic Understanding of ATC Speech: Study of Prospectives and Field Experiments for Several Controller Positions

IEEE Transactions on Aerospace and Electronic Systems, 2011

Research paper thumbnail of Automatic Understanding of ATC Speech

IEEE Aerospace and Electronic Systems Magazine, 2006

Research paper thumbnail of Knowledge-Combining Methodology for Dialogue Design in Spoken Language Systems

Genetic Resources and Crop Evolution, 2005

Log In