lakshmi Saheer - Academia.edu (original) (raw)
Papers by lakshmi Saheer
Frontiers in artificial intelligence, Jan 29, 2024
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as ... more Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as ... more Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
Technical University of Denmark, DTU Orbit (Technical University of Denmark, DTU), Nov 2, 2020
Users may download and print one copy of any publication from the public portal for the purpose... more Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Lecture Notes in Computer Science, 2022
Vocal tract length normalization is an important feature normalization technique that can be used... more Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.
International Journal of Environmental Research and Public Health
Speech emotion recognition is an important research topic that can help to maintain and improve p... more Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, s...
Lecture notes in networks and systems, 2022
Lecture notes in networks and systems, 2022
I would like to thank Idiap management especially Prof. Hervé Bourlard (my supervisor and directo... more I would like to thank Idiap management especially Prof. Hervé Bourlard (my supervisor and director of Idiap) for providing me this great opportunity to work at Idiap and ensuring all the resources needed for my research. Prof. Bourlard was a great source of personal inspiration. I thank the secretaries, Mrs. Nadine Rousseau and Mrs. Sylvie Millius for all the administrative support. It was not easy to find my way out in Switzerland from the first day of my arrival till date. Special thanks to the deputy director of Idiap, Dr. Francois Foglia for his support specially during the international create challenge (ICC 2012) and his support and confidence in my project. I thank Dr. Milos Cernak for the all the help and support especially for the ICC project. We are a great team and plan to continue this collaboration as far as we can. The ICC group was a good source of happiness. I also thank the other support staff at Idiap, Frank Formaz, Norbert Crettol, Vincent Spano, Alexandre Nanchen, Ed Gregg, Christophe Ecoeur and several others. Special thanks to my colleagues Flavio Tarsetti and Laurent El-Shafey for the help with French translations. I am lucky to have known Particia Emonet with her help in French and my personal life. Special thanks to my friend and colleague, Afsaneh Asaei for being there to share my happiness and to comfort me in times of distress. We helped each other in our ordeals. Similarily, my friend and colleague, Ramya Rasipuram for being a great source of comfort. We both delivered our babies around the same time and could easily share our happiness and troubles. We had some great hikes organised by Marco Fornoni, Laurent El-Shafey, and Deepu Vijayasenan. Thanks to the Indian community in Martigny (Samuel, Jamie, Deepu, Ramya, Murali, Venkatesh, Abhilasha, Jagan, Gokul, Sriram, Harsha, Mathew, Dinesh and several others) for keeping my social life alive with great activities like Indian dinner night, barbecues and other get-togethers from time to time. There a lot of other Idiap colleagues who made my PhD life enjoyable including Oya,
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013
ABSTRACT
ArXiv, 2021
Urban trees help regulate temperature, reduce energy consumption, improve urban air quality, redu... more Urban trees help regulate temperature, reduce energy consumption, improve urban air quality, reduce wind speeds, and mitigating the urban heat island effect. Urban trees also play a key role in climate change mitigation and global warming by capturing and storing atmospheric carbon-dioxide which is the largest contributor to greenhouse gases. Automated tree detection and species classification using aerial imagery can be a powerful tool for sustainable forest and urban tree management. Hence, This study first offers a pipeline for generating labelled dataset of urban trees using Google Map's aerial images and then investigates how state of the art deep Convolutional Neural Network models such as VGG and ResNet handle the classification problem of urban tree aerial images under different parameters. Experimental results show our best model achieves an average accuracy of 60% over 6 tree species.
@Book{Demos:2010, editor = {Sandra K\"{u}bler}, title = {Pro... more @Book{Demos:2010, editor = {Sandra K\"{u}bler}, title = {Proceedings of the ACL 2010 System Demonstrations}, month = {July}, year = {2010}, address = {Uppsala, Sweden}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/anthology/P10-4 ...
Frontiers in Big Data
Monitoring, predicting, and controlling the air quality in urban areas is one of the effective so... more Monitoring, predicting, and controlling the air quality in urban areas is one of the effective solutions for tackling the climate change problem. Leveraging the availability of big data in different domains like pollutant concentration, urban traffic, aerial imagery of terrains and vegetation, and weather conditions can aid in understanding the interactions between these factors and building a reliable air quality prediction model. This research proposes a novel cost-effective and efficient air quality modeling framework including all these factors employing state-of-the-art artificial intelligence techniques. The framework also includes a novel deep learning-based vegetation detection system using aerial images. The pilot study conducted in the UK city of Cambridge using the proposed framework investigates various predictive models ranging from statistical to machine learning and deep recurrent neural network models. This framework opens up possibilities of broadening air quality m...
Electronics, 2021
It is becoming increasingly apparent that a significant amount of the population suffers from men... more It is becoming increasingly apparent that a significant amount of the population suffers from mental health problems, such as stress, depression, and anxiety. These issues are a result of a vast range of factors, such as genetic conditions, social circumstances, and lifestyle influences. A key cause, or contributor, for many people is their work; poor mental state can be exacerbated by jobs and a person’s working environment. Additionally, as the information age continues to burgeon, people are increasingly sedentary in their working lives, spending more of their days seated, and less time moving around. It is a well-known fact that a decrease in physical activity is detrimental to mental well-being. Therefore, the need for innovative research and development to combat negativity early is required. Implementing solutions using Artificial Intelligence has great potential in this field of research. This work proposes a solution to this problem domain, utilising two concepts of Artific...
Frontiers in artificial intelligence, Jan 29, 2024
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as ... more Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as ... more Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity.
Technical University of Denmark, DTU Orbit (Technical University of Denmark, DTU), Nov 2, 2020
Users may download and print one copy of any publication from the public portal for the purpose... more Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Lecture Notes in Computer Science, 2022
Vocal tract length normalization is an important feature normalization technique that can be used... more Vocal tract length normalization is an important feature normalization technique that can be used to perform speaker adaptation when very little adaptation data is available. It was shown earlier that VTLN can be applied to statistical speech synthesis and was shown to give additive improvements to CMLLR. This paper presents an EM optimization for estimating more accurate warping factors. The EM formulation helps to embed the feature normalization in the HMM training. This helps in estimating the warping factors more efficiently and enables the use of multiple (appropriate) warping factors for different state clusters of the same speaker.
International Journal of Environmental Research and Public Health
Speech emotion recognition is an important research topic that can help to maintain and improve p... more Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, s...
Lecture notes in networks and systems, 2022
Lecture notes in networks and systems, 2022
I would like to thank Idiap management especially Prof. Hervé Bourlard (my supervisor and directo... more I would like to thank Idiap management especially Prof. Hervé Bourlard (my supervisor and director of Idiap) for providing me this great opportunity to work at Idiap and ensuring all the resources needed for my research. Prof. Bourlard was a great source of personal inspiration. I thank the secretaries, Mrs. Nadine Rousseau and Mrs. Sylvie Millius for all the administrative support. It was not easy to find my way out in Switzerland from the first day of my arrival till date. Special thanks to the deputy director of Idiap, Dr. Francois Foglia for his support specially during the international create challenge (ICC 2012) and his support and confidence in my project. I thank Dr. Milos Cernak for the all the help and support especially for the ICC project. We are a great team and plan to continue this collaboration as far as we can. The ICC group was a good source of happiness. I also thank the other support staff at Idiap, Frank Formaz, Norbert Crettol, Vincent Spano, Alexandre Nanchen, Ed Gregg, Christophe Ecoeur and several others. Special thanks to my colleagues Flavio Tarsetti and Laurent El-Shafey for the help with French translations. I am lucky to have known Particia Emonet with her help in French and my personal life. Special thanks to my friend and colleague, Afsaneh Asaei for being there to share my happiness and to comfort me in times of distress. We helped each other in our ordeals. Similarily, my friend and colleague, Ramya Rasipuram for being a great source of comfort. We both delivered our babies around the same time and could easily share our happiness and troubles. We had some great hikes organised by Marco Fornoni, Laurent El-Shafey, and Deepu Vijayasenan. Thanks to the Indian community in Martigny (Samuel, Jamie, Deepu, Ramya, Murali, Venkatesh, Abhilasha, Jagan, Gokul, Sriram, Harsha, Mathew, Dinesh and several others) for keeping my social life alive with great activities like Indian dinner night, barbecues and other get-togethers from time to time. There a lot of other Idiap colleagues who made my PhD life enjoyable including Oya,
2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013
ABSTRACT
ArXiv, 2021
Urban trees help regulate temperature, reduce energy consumption, improve urban air quality, redu... more Urban trees help regulate temperature, reduce energy consumption, improve urban air quality, reduce wind speeds, and mitigating the urban heat island effect. Urban trees also play a key role in climate change mitigation and global warming by capturing and storing atmospheric carbon-dioxide which is the largest contributor to greenhouse gases. Automated tree detection and species classification using aerial imagery can be a powerful tool for sustainable forest and urban tree management. Hence, This study first offers a pipeline for generating labelled dataset of urban trees using Google Map's aerial images and then investigates how state of the art deep Convolutional Neural Network models such as VGG and ResNet handle the classification problem of urban tree aerial images under different parameters. Experimental results show our best model achieves an average accuracy of 60% over 6 tree species.
@Book{Demos:2010, editor = {Sandra K\"{u}bler}, title = {Pro... more @Book{Demos:2010, editor = {Sandra K\"{u}bler}, title = {Proceedings of the ACL 2010 System Demonstrations}, month = {July}, year = {2010}, address = {Uppsala, Sweden}, publisher = {Association for Computational Linguistics}, url = {http://www.aclweb.org/anthology/P10-4 ...
Frontiers in Big Data
Monitoring, predicting, and controlling the air quality in urban areas is one of the effective so... more Monitoring, predicting, and controlling the air quality in urban areas is one of the effective solutions for tackling the climate change problem. Leveraging the availability of big data in different domains like pollutant concentration, urban traffic, aerial imagery of terrains and vegetation, and weather conditions can aid in understanding the interactions between these factors and building a reliable air quality prediction model. This research proposes a novel cost-effective and efficient air quality modeling framework including all these factors employing state-of-the-art artificial intelligence techniques. The framework also includes a novel deep learning-based vegetation detection system using aerial images. The pilot study conducted in the UK city of Cambridge using the proposed framework investigates various predictive models ranging from statistical to machine learning and deep recurrent neural network models. This framework opens up possibilities of broadening air quality m...
Electronics, 2021
It is becoming increasingly apparent that a significant amount of the population suffers from men... more It is becoming increasingly apparent that a significant amount of the population suffers from mental health problems, such as stress, depression, and anxiety. These issues are a result of a vast range of factors, such as genetic conditions, social circumstances, and lifestyle influences. A key cause, or contributor, for many people is their work; poor mental state can be exacerbated by jobs and a person’s working environment. Additionally, as the information age continues to burgeon, people are increasingly sedentary in their working lives, spending more of their days seated, and less time moving around. It is a well-known fact that a decrease in physical activity is detrimental to mental well-being. Therefore, the need for innovative research and development to combat negativity early is required. Implementing solutions using Artificial Intelligence has great potential in this field of research. This work proposes a solution to this problem domain, utilising two concepts of Artific...