Rui Pedro Paiva - Academia.edu (original) (raw)
Papers by Rui Pedro Paiva
We present a set of novel emotionally-relevant audio features to help improving the classificatio... more We present a set of novel emotionally-relevant audio features to help improving the classification of emotions in audio music. First, a review of the state-of-the-art regarding emotion and music was conducted, to understand how the various music concepts may influence human emotions. Next, well known audio frameworks were analyzed, assessing how their extractors relate with the studied musical concepts. The intersection of this data showed an unbalanced representation of the eight musical concepts. Namely, most extractors are low-level and related with tone color, while musical form, musical texture and expressive techniques are lacking. Based on this, we developed a set of new algorithms to capture information related with musical texture and expressive techniques, the two most lacking concepts. To validate our work, a public dataset containing 900 30-second clips, annotated in terms of Russell's emotion quadrants was created. The inclusion of our features improved the F1-score...
Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
This research addresses the role of the lyrics in the context of music emotion variation detectio... more This research addresses the role of the lyrics in the context of music emotion variation detection. To accomplish this task we create a system to detect the predominant emotion expressed by each sentence (verse) of the lyrics. The system employs Russell's emotion model and contains 4 sets of emotions associated to each quadrant. To detect the predominant emotion in each verse, we propose a novel keyword-based approach, which receives a sentence (verse) and classifies it in the appropriate quadrant. To tune the system parameters, we created a 129-sentence training dataset from 68 songs. To validate our system, we created a separate ground-truth containing 239 sentences (verses) from 44 songs annotated manually with an average of 7 annotations per sentence. The system attains 67.4% F-Measure score.
Sensors
Lung sounds acquired by stethoscopes are extensively used in diagnosing and differentiating respi... more Lung sounds acquired by stethoscopes are extensively used in diagnosing and differentiating respiratory diseases. Although an extensive know-how has been built to interpret these sounds and identify diseases associated with certain patterns, its effective use is limited to individual experience of practitioners. This user-dependency manifests itself as a factor impeding the digital transformation of this valuable diagnostic tool, which can improve patient outcomes by continuous long-term respiratory monitoring under real-life conditions. Particularly patients suffering from respiratory diseases with progressive nature, such as chronic obstructive pulmonary diseases, are expected to benefit from long-term monitoring. Recently, the COVID-19 pandemic has also shown the lack of respiratory monitoring systems which are ready to deploy in operational conditions while requiring minimal patient education. To address particularly the latter subject, in this article, we present a sound acquis...
IEEE Transactions on Affective Computing
Physiological Measurement
IFAC Proceedings Volumes
Page 1. QUALITY PREDICTION IN INDUSTRIAL PROCESSES: APPLICATION OF A NEURO-FUZZY SYSTEM Rui Pedro... more Page 1. QUALITY PREDICTION IN INDUSTRIAL PROCESSES: APPLICATION OF A NEURO-FUZZY SYSTEM Rui Pedro Paivaa, António Douradoa, Belmiro Duarteb aCISUC Centro de Informática e Sistemas da Universidade ...
IEEE Transactions on Affective Computing, 2016
This research addresses the role of lyrics in the music emotion recognition process. Our approach... more This research addresses the role of lyrics in the music emotion recognition process. Our approach is based on several state of the art features complemented by novel stylistic, structural and semantic features. To evaluate our approach, we created a ground truth dataset containing 180 song lyrics, according to Russell's emotion model. We conduct four types of experiments: regression and classification by quadrant, arousal and valence categories. Comparing to the state of the art features (ngrams-baseline), adding other features, including novel features, improved the F-measure from 69.9%, 82.7% and 85.6% to 80.1%, 88.3% and 90%, respectively for the three classification experiments. To study the relation between features and emotions (quadrants) we performed experiments to identify the best features that allow to describe and discriminate each quadrant. To further validate these experiments, we built a validation set comprising 771 lyrics extracted from the AllMusic platform, having achieved 73.6% F-measure in the classification by quadrants. We also conducted experiments to identify interpretable rules that show the relation between features and emotions and the relation among features. Regarding regression, results show that, comparing to similar studies for audio, we achieve a similar performance for arousal and a much better performance for valence.
This paper provides an overview of current state-of-the-art approaches for melody extraction from... more This paper provides an overview of current state-of-the-art approaches for melody extraction from polyphonic audio recordings, and it proposes a methodology for the quantitative evaluation of melody extraction algorithms. We first define a general architecture for melody extraction systems and discuss the difficulties of the problem in hand; then, we review different approaches for melody extraction which represent the current state-of-the-art in this area. We propose and discuss a methodology for evaluating the different approaches, and we finally present some results and conclusions of the comparison.
This paper describes an algorithm for melody detection in polyphonic recordings. Our approach sta... more This paper describes an algorithm for melody detection in polyphonic recordings. Our approach starts by obtaining a set of pitch candidates for each time frame, with recourse to an auditory model. Trajectories of the most salient pitches are then constructed. Next, note candidates are obtained by trajectory segmentation (in terms of frequency and pitch salience variations). Too short, low-salience and harmonically related notes are then eliminated. Finally, we extract the notes comprising the melody by selecting the most salient ones at each moment, exploiting melodic smoothness and removing spurious notes that correspond to abrupt drops in note saliences or durations.
Conference Proceedings Annual International Conference of the Ieee Engineering in Medicine and Biology Society Ieee Engineering in Medicine and Biology Society Conference, Jul 1, 2013
Neurally Mediated Syncope (NMS) is often cited as the most common cause of syncope. It can lead t... more Neurally Mediated Syncope (NMS) is often cited as the most common cause of syncope. It can lead to severe consequences such as injuries, high rates of hospitalization and reduced quality of life, especially in elderly populations. Therefore, information about the syncope triggers and reflex mechanisms would be of a great value in the development of a cost-effective p-health system for the prediction of syncope episodes, by enhancing patients' quality of life and reducing the incidence of syncope related disorders/conditions. In the present paper we study the characterization of syncope reflex mechanisms and blood pressure changes from the analysis of several non-invasive modalities (ECG, ICG and PPG). Several parameters were extracted in order to characterize the chronotropic, inotropic and vascular tone changes. Thus, we evaluate the ability of parameters such as Heart Rate (HR), Pre-Ejection Period (PEP) and Left Ventricular Ejection Time (LVET) to characterize the physiological mechanisms behind the development of reflex syncope and their potential syncope prediction capability. The significant parameter changes (e.g. HR from 12.9% to-12.4%, PEP from 14.9% to-3.8% and LVET from-14.4% to 12.3%) found in the present work suggest the feasibility of these surrogates to characterize the blood pressure regulation mechanisms during impending syncope.
2005 13th European Signal Processing Conference, Sep 1, 2005
We propose a method for segmentation of pitch tracks for melody detection in polyphonic musical s... more We propose a method for segmentation of pitch tracks for melody detection in polyphonic musical signals. This is an important issue for melody-based music information retrieval, as well as melody transcription. Past work in the field addressed especially the issue of extracting melodic pitch lines, without explicit definition of notes. Thus, in this work, we propose a two-stage segmentation of pitch tracks for the determination of musical notes. In the first stage, frequency-based segmentation is conducted with recourse to frequency variations in pitch tracks. In the second phase, saliencebased segmentation is performed in order to split consecutive notes with equal value, using pitch salience minima and note onsets.
The present study addresses the problem of defining musical notes from pitch tracks, in the conte... more The present study addresses the problem of defining musical notes from pitch tracks, in the context of a system for melody detection in polyphonic musical signals. This is an important issue for melody transcription, as well as melody-based music information retrieval. Previous work in the area tackled mainly the extraction of melodic pitch lines, without explicit determination of musical notes. Therefore, in this paper we propose an approach for the creation of musical notes based on a two-stage segmentation of pitch tracks. In the first step, frequency-based segmentation is carried out through the detection of frequency variations in pitch tracks. In the second stage, salience-based segmentation is performed so as to split consecutive notes with equal value, by making use of salience minima and note onsets.
International Computer Music Conference Proceedings, Oct 3, 2005
This paper describes a method for melody detection in polyphonic musical signals. Our approach st... more This paper describes a method for melody detection in polyphonic musical signals. Our approach starts by obtaining a set of pitch candidates for each time frame, with recourse to an auditory model. Trajectories of the most salient pitches are then constructed. Next, note candidates are obtained by trajectory segmentation (in terms of frequency and pitch salience variations). Too short, low-salience and harmonically-related notes are then eliminated. Finally, the notes comprising the melody are extracted. Comparing to our previous work, we extend it by making use of melodic smoothness for the definition of the final melody notes. We tested our method with excerpts from 21 songs encompassing several genres and obtained an average detection accuracy of 82%. Melody smoothing was responsible for an improvement of 11.8% in the overall accuracy.
... Most likely, sufficiently robust, general, accurate and efficient algorithms will only become... more ... Most likely, sufficiently robust, general, accurate and efficient algorithms will only become available after several years of intensive research. Description: Tese de doutoramento em EngenhariaInformática apresentada à Fac. de Ciências e Tecnologia de Coimbra. ...
In this paper we propose a solution for automatic mood tracking in audio music, based on supervis... more In this paper we propose a solution for automatic mood tracking in audio music, based on supervised learning and classification. To this end, various music clips with a duration of 25 seconds, previously annotated with arousal and valence (AV) values, were used to train several models. These models were used to predict quadrants of the Thayer's taxonomy and AV values, of small segments from full songs, revealing the mood changes over time. The system accuracy was measured by calculating the matching ratio between predicted results and full song annotations performed by volunteers. Different combinations of audio features, frameworks and other parameters were tested, resulting in an accuracy of 56.3% and showing there is still much room for improvement. 1.
We propose an approach for the automatic creation of mood playlists in the Thayer plane (TP). Mus... more We propose an approach for the automatic creation of mood playlists in the Thayer plane (TP). Music emotion recognition is tackled as a regression and classification problem, aiming to predict the arousal and valence (AV) values of each song in the TP, based on Yang's dataset. To this end, a high number of audio features are extracted using three frameworks: PsySound, MIR Toolbox and Marsyas. The extracted features and Yang's annotated AV values are used to train several Support Vector Regressors, each employing different feature sets. The best performance, in terms of R 2 statistics, was attained after feature selection, reaching 63% for arousal and 35.6% for valence. Based on the predicted location of each song in the TP, mood playlists can be created by specifying a point in the plane, from which the closest songs are retrieved. Using one seed song, the accuracy of the created playlists was 62.3% for 20-song playlists, 24.8% for 5song playlists and 6.2% for the top song. 1 Even though mood and emotion can be defined differently, the two terms are used interchangeably in the literature and in this paper. For further details, see [4].
We present a set of novel emotionally-relevant audio features to help improving the classificatio... more We present a set of novel emotionally-relevant audio features to help improving the classification of emotions in audio music. First, a review of the state-of-the-art regarding emotion and music was conducted, to understand how the various music concepts may influence human emotions. Next, well known audio frameworks were analyzed, assessing how their extractors relate with the studied musical concepts. The intersection of this data showed an unbalanced representation of the eight musical concepts. Namely, most extractors are low-level and related with tone color, while musical form, musical texture and expressive techniques are lacking. Based on this, we developed a set of new algorithms to capture information related with musical texture and expressive techniques, the two most lacking concepts. To validate our work, a public dataset containing 900 30-second clips, annotated in terms of Russell's emotion quadrants was created. The inclusion of our features improved the F1-score...
Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management
This research addresses the role of the lyrics in the context of music emotion variation detectio... more This research addresses the role of the lyrics in the context of music emotion variation detection. To accomplish this task we create a system to detect the predominant emotion expressed by each sentence (verse) of the lyrics. The system employs Russell's emotion model and contains 4 sets of emotions associated to each quadrant. To detect the predominant emotion in each verse, we propose a novel keyword-based approach, which receives a sentence (verse) and classifies it in the appropriate quadrant. To tune the system parameters, we created a 129-sentence training dataset from 68 songs. To validate our system, we created a separate ground-truth containing 239 sentences (verses) from 44 songs annotated manually with an average of 7 annotations per sentence. The system attains 67.4% F-Measure score.
Sensors
Lung sounds acquired by stethoscopes are extensively used in diagnosing and differentiating respi... more Lung sounds acquired by stethoscopes are extensively used in diagnosing and differentiating respiratory diseases. Although an extensive know-how has been built to interpret these sounds and identify diseases associated with certain patterns, its effective use is limited to individual experience of practitioners. This user-dependency manifests itself as a factor impeding the digital transformation of this valuable diagnostic tool, which can improve patient outcomes by continuous long-term respiratory monitoring under real-life conditions. Particularly patients suffering from respiratory diseases with progressive nature, such as chronic obstructive pulmonary diseases, are expected to benefit from long-term monitoring. Recently, the COVID-19 pandemic has also shown the lack of respiratory monitoring systems which are ready to deploy in operational conditions while requiring minimal patient education. To address particularly the latter subject, in this article, we present a sound acquis...
IEEE Transactions on Affective Computing
Physiological Measurement
IFAC Proceedings Volumes
Page 1. QUALITY PREDICTION IN INDUSTRIAL PROCESSES: APPLICATION OF A NEURO-FUZZY SYSTEM Rui Pedro... more Page 1. QUALITY PREDICTION IN INDUSTRIAL PROCESSES: APPLICATION OF A NEURO-FUZZY SYSTEM Rui Pedro Paivaa, António Douradoa, Belmiro Duarteb aCISUC Centro de Informática e Sistemas da Universidade ...
IEEE Transactions on Affective Computing, 2016
This research addresses the role of lyrics in the music emotion recognition process. Our approach... more This research addresses the role of lyrics in the music emotion recognition process. Our approach is based on several state of the art features complemented by novel stylistic, structural and semantic features. To evaluate our approach, we created a ground truth dataset containing 180 song lyrics, according to Russell's emotion model. We conduct four types of experiments: regression and classification by quadrant, arousal and valence categories. Comparing to the state of the art features (ngrams-baseline), adding other features, including novel features, improved the F-measure from 69.9%, 82.7% and 85.6% to 80.1%, 88.3% and 90%, respectively for the three classification experiments. To study the relation between features and emotions (quadrants) we performed experiments to identify the best features that allow to describe and discriminate each quadrant. To further validate these experiments, we built a validation set comprising 771 lyrics extracted from the AllMusic platform, having achieved 73.6% F-measure in the classification by quadrants. We also conducted experiments to identify interpretable rules that show the relation between features and emotions and the relation among features. Regarding regression, results show that, comparing to similar studies for audio, we achieve a similar performance for arousal and a much better performance for valence.
This paper provides an overview of current state-of-the-art approaches for melody extraction from... more This paper provides an overview of current state-of-the-art approaches for melody extraction from polyphonic audio recordings, and it proposes a methodology for the quantitative evaluation of melody extraction algorithms. We first define a general architecture for melody extraction systems and discuss the difficulties of the problem in hand; then, we review different approaches for melody extraction which represent the current state-of-the-art in this area. We propose and discuss a methodology for evaluating the different approaches, and we finally present some results and conclusions of the comparison.
This paper describes an algorithm for melody detection in polyphonic recordings. Our approach sta... more This paper describes an algorithm for melody detection in polyphonic recordings. Our approach starts by obtaining a set of pitch candidates for each time frame, with recourse to an auditory model. Trajectories of the most salient pitches are then constructed. Next, note candidates are obtained by trajectory segmentation (in terms of frequency and pitch salience variations). Too short, low-salience and harmonically related notes are then eliminated. Finally, we extract the notes comprising the melody by selecting the most salient ones at each moment, exploiting melodic smoothness and removing spurious notes that correspond to abrupt drops in note saliences or durations.
Conference Proceedings Annual International Conference of the Ieee Engineering in Medicine and Biology Society Ieee Engineering in Medicine and Biology Society Conference, Jul 1, 2013
Neurally Mediated Syncope (NMS) is often cited as the most common cause of syncope. It can lead t... more Neurally Mediated Syncope (NMS) is often cited as the most common cause of syncope. It can lead to severe consequences such as injuries, high rates of hospitalization and reduced quality of life, especially in elderly populations. Therefore, information about the syncope triggers and reflex mechanisms would be of a great value in the development of a cost-effective p-health system for the prediction of syncope episodes, by enhancing patients' quality of life and reducing the incidence of syncope related disorders/conditions. In the present paper we study the characterization of syncope reflex mechanisms and blood pressure changes from the analysis of several non-invasive modalities (ECG, ICG and PPG). Several parameters were extracted in order to characterize the chronotropic, inotropic and vascular tone changes. Thus, we evaluate the ability of parameters such as Heart Rate (HR), Pre-Ejection Period (PEP) and Left Ventricular Ejection Time (LVET) to characterize the physiological mechanisms behind the development of reflex syncope and their potential syncope prediction capability. The significant parameter changes (e.g. HR from 12.9% to-12.4%, PEP from 14.9% to-3.8% and LVET from-14.4% to 12.3%) found in the present work suggest the feasibility of these surrogates to characterize the blood pressure regulation mechanisms during impending syncope.
2005 13th European Signal Processing Conference, Sep 1, 2005
We propose a method for segmentation of pitch tracks for melody detection in polyphonic musical s... more We propose a method for segmentation of pitch tracks for melody detection in polyphonic musical signals. This is an important issue for melody-based music information retrieval, as well as melody transcription. Past work in the field addressed especially the issue of extracting melodic pitch lines, without explicit definition of notes. Thus, in this work, we propose a two-stage segmentation of pitch tracks for the determination of musical notes. In the first stage, frequency-based segmentation is conducted with recourse to frequency variations in pitch tracks. In the second phase, saliencebased segmentation is performed in order to split consecutive notes with equal value, using pitch salience minima and note onsets.
The present study addresses the problem of defining musical notes from pitch tracks, in the conte... more The present study addresses the problem of defining musical notes from pitch tracks, in the context of a system for melody detection in polyphonic musical signals. This is an important issue for melody transcription, as well as melody-based music information retrieval. Previous work in the area tackled mainly the extraction of melodic pitch lines, without explicit determination of musical notes. Therefore, in this paper we propose an approach for the creation of musical notes based on a two-stage segmentation of pitch tracks. In the first step, frequency-based segmentation is carried out through the detection of frequency variations in pitch tracks. In the second stage, salience-based segmentation is performed so as to split consecutive notes with equal value, by making use of salience minima and note onsets.
International Computer Music Conference Proceedings, Oct 3, 2005
This paper describes a method for melody detection in polyphonic musical signals. Our approach st... more This paper describes a method for melody detection in polyphonic musical signals. Our approach starts by obtaining a set of pitch candidates for each time frame, with recourse to an auditory model. Trajectories of the most salient pitches are then constructed. Next, note candidates are obtained by trajectory segmentation (in terms of frequency and pitch salience variations). Too short, low-salience and harmonically-related notes are then eliminated. Finally, the notes comprising the melody are extracted. Comparing to our previous work, we extend it by making use of melodic smoothness for the definition of the final melody notes. We tested our method with excerpts from 21 songs encompassing several genres and obtained an average detection accuracy of 82%. Melody smoothing was responsible for an improvement of 11.8% in the overall accuracy.
... Most likely, sufficiently robust, general, accurate and efficient algorithms will only become... more ... Most likely, sufficiently robust, general, accurate and efficient algorithms will only become available after several years of intensive research. Description: Tese de doutoramento em EngenhariaInformática apresentada à Fac. de Ciências e Tecnologia de Coimbra. ...
In this paper we propose a solution for automatic mood tracking in audio music, based on supervis... more In this paper we propose a solution for automatic mood tracking in audio music, based on supervised learning and classification. To this end, various music clips with a duration of 25 seconds, previously annotated with arousal and valence (AV) values, were used to train several models. These models were used to predict quadrants of the Thayer's taxonomy and AV values, of small segments from full songs, revealing the mood changes over time. The system accuracy was measured by calculating the matching ratio between predicted results and full song annotations performed by volunteers. Different combinations of audio features, frameworks and other parameters were tested, resulting in an accuracy of 56.3% and showing there is still much room for improvement. 1.
We propose an approach for the automatic creation of mood playlists in the Thayer plane (TP). Mus... more We propose an approach for the automatic creation of mood playlists in the Thayer plane (TP). Music emotion recognition is tackled as a regression and classification problem, aiming to predict the arousal and valence (AV) values of each song in the TP, based on Yang's dataset. To this end, a high number of audio features are extracted using three frameworks: PsySound, MIR Toolbox and Marsyas. The extracted features and Yang's annotated AV values are used to train several Support Vector Regressors, each employing different feature sets. The best performance, in terms of R 2 statistics, was attained after feature selection, reaching 63% for arousal and 35.6% for valence. Based on the predicted location of each song in the TP, mood playlists can be created by specifying a point in the plane, from which the closest songs are retrieved. Using one seed song, the accuracy of the created playlists was 62.3% for 20-song playlists, 24.8% for 5song playlists and 6.2% for the top song. 1 Even though mood and emotion can be defined differently, the two terms are used interchangeably in the literature and in this paper. For further details, see [4].