Juan M Montero - Academia.edu (original) (raw)
Uploads
Papers by Juan M Montero
Applied sciences, Mar 12, 2018
Procesamiento Del Lenguaje Natural, Sep 1, 2019
Applied Sciences, 2022
Intent recognition is a key component of any task-oriented conversational system. The intent reco... more Intent recognition is a key component of any task-oriented conversational system. The intent recognizer can be used first to classify the user’s utterance into one of several predefined classes (intents) that help to understand the user’s current goal. Then, the most adequate response can be provided accordingly. Intent recognizers also often appear as a form of joint models for performing the natural language understanding and dialog management tasks together as a single process, thus simplifying the set of problems that a conversational system must solve. This happens to be especially true for frequently asked question (FAQ) conversational systems. In this work, we first present an exploratory analysis in which different deep learning (DL) models for intent detection and classification were evaluated. In particular, we experimentally compare and analyze conventional recurrent neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that best perform...
Applied Sciences, 2021
Emotion recognition is attracting the attention of the research community due to its multiple app... more Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow di...
This paper introduces a continuous system capable of automatically producing the most adequate sp... more This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9...
Lecture Notes in Computer Science, 2012
Advances in Speech and Language Technologies for Iberian Languages, 2014
Advances in Service Robotics, 2008
Pattern Analysis and Applications, 2011
IEEE Transactions on Aerospace and Electronic Systems, 2011
IEEE Aerospace and Electronic Systems Magazine, 2006
Genetic Resources and Crop Evolution, 2005
Applied sciences, Mar 12, 2018
Procesamiento Del Lenguaje Natural, Sep 1, 2019
Applied Sciences, 2022
Intent recognition is a key component of any task-oriented conversational system. The intent reco... more Intent recognition is a key component of any task-oriented conversational system. The intent recognizer can be used first to classify the user’s utterance into one of several predefined classes (intents) that help to understand the user’s current goal. Then, the most adequate response can be provided accordingly. Intent recognizers also often appear as a form of joint models for performing the natural language understanding and dialog management tasks together as a single process, thus simplifying the set of problems that a conversational system must solve. This happens to be especially true for frequently asked question (FAQ) conversational systems. In this work, we first present an exploratory analysis in which different deep learning (DL) models for intent detection and classification were evaluated. In particular, we experimentally compare and analyze conventional recurrent neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that best perform...
Applied Sciences, 2021
Emotion recognition is attracting the attention of the research community due to its multiple app... more Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow di...
This paper introduces a continuous system capable of automatically producing the most adequate sp... more This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text. This is done thanks to a joint modeling of the acoustic and lexical parameters of the speaker models by adapting the CVSM projection of the training texts using MR-HMM techniques. As such, we consider that as long as sufficient variety in the training data is available, we should be able to model a continuous lexical space into a continuous acoustic space. The proposed continuous automatic text to speech system was evaluated by means of a perceptual evaluation in order to compare them with traditional approaches to the task. The system proved to be capable of conveying the correct expressiveness (average adequacy of 3.6) with an expressive strength comparable to oracle traditional expressive speech synthesis (average of 3.6) although with a drop in speech quality mainly due to the semi-continuous nature of the data (average quality of 2.9...
Lecture Notes in Computer Science, 2012
Advances in Speech and Language Technologies for Iberian Languages, 2014
Advances in Service Robotics, 2008
Pattern Analysis and Applications, 2011
IEEE Transactions on Aerospace and Electronic Systems, 2011
IEEE Aerospace and Electronic Systems Magazine, 2006
Genetic Resources and Crop Evolution, 2005