Emotion Recognition in Multi-speaker Scenarios for Social Robots (original) (raw)
References
Nocentini O, Fiorini L, Acerbi G, Sorrentino A, Mancioppi G, Cavallo F (2019) A survey of behavioral models for social robots. Robotics 3:54 Article Google Scholar
Martinez-Martin E, Del Pobil AP (2018) Personal robot assistants for elderly care: an overview. Pers Assistants: Emerging Comput Technol 77–91
Asgharian P, Panchea AM, Ferland F (2022) A review on the use of mobile service robots in elderly care. Robotics 11(6):127 Article Google Scholar
Toh LPE, Causo A, pei-Wen Tzuo I-MC, and Yeo SH (2016 2) A review on the use of robots in education and young children. Br J Educ Technol & Soc 19:148–163 Google Scholar
Chen H, Park HW, Breazeal C (2020) Teaching and learning with children: impact of reciprocal peer learning with a social robot on children’s learning and emotive engagement. Comput & Educ 150(103836)
Sasaki Y, Nitta J (2017) Long-term demonstration experiment of autonomous mobile robot in a science museum. In International Symposium on Robotics and Intelligent Sensors, IEEE, pp 304–310
Hellou M, Lim J, Gasteiger N, Jang M, Ahn HS (2022) Technical methods for social robots in museum settings: an overview of the literature. Int J Soc robot 14(8):1767–1786 Article Google Scholar
Clark HH, Fischer K (2023) Social robots as depictions of social agents. Behav Brain Sci 46(e21)
Doğan S, Colak A (2024) Social robots in the instruction of social skills in autism: a comprehensive descriptive analysis of single-case experimental designs. Disabil Rehabil Assist Technol 19(2):325–344 Article Google Scholar
Laban G, Morrison V, Cross ES (2024) Social robots for health psychology: a new frontier for improving human health and well-being. Eur Health psychologist 23(1):1095–1102 Google Scholar
Khan A, Anwar Y (2019) Robots in healthcare: a survey. In Science and Information Conference, Springer, Berlin, Germany, 280–292
Gulnaz Nasir Peerzade RD, Waghmare SD (2018) A review: speech emotion recognition. Int. J. Comput. Sci. Eng 6(3):400–402 Google Scholar
Wang D, Chen J (2018) Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans Audio, Speech, Language process 26(10):1702–1726 ArticleMathSciNet Google Scholar
Soni S, Yadav RN, Gupta L (2023) State-of-the-art analysis of deep learning-based monaural speech source separation techniques. IEEE Access. 11:4242–4269 Article Google Scholar
Ravenscroft W, Goetze S, Hain T (2022) Deformable temporal convolutional networks for monaural noisy reverberant speech separation. arXiv preprint arXiv:2210.15305
Nachmani E, Adi Y, (2020) Wolf L Voice separation with an unknown number of multiple speakers. In International Conference on Machine Learning, PMLR, pp 7164–7175
Sakthi S (2022) Dnn based speech quality enhancement and multi-speaker separation for automatic speech recognition system. Mach Learn Algoritms For Signal And Image processing 231–246
Jin R, Ablimit M, Hamdulla A (2022) Speech separation and emotion recognition for multi-speaker scenarios. In: 3rd International Conference on Pattern Recognition and Machine Learning (PRML). IEEE, pp 280–284
Zhang Z, Yoshioka T, Kanda N, Chen Z, Wang X, Wang D, Eskimez SE (2022) All-neural beamformer for continuous speech separation. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 6032–6036
Chen S, Wu Y, Chen Z, Yoshioka T, Liu S, Jinyu L, Xiangzhan Y (2021) Don’t shoot butterfly with rifles: multi-channel continuous speech separation with early exit transformer. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6139–6143
Wang X, Wang D, Kanda N, Eskimez SE, Yoshioka T (2022) Leveraging real conversational data for multi-channel continuous speech separation. arXiv preprint arXiv:2204.03232
Subakan C, Ravanelli M, Cornell S, Bronzi M, Zhong J (2021) Attention is all you need in speech separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 21–25
Rieger SA, Muraleedharan R, Ramachandran RP (2014) Speech based emotion recognition using spectral feature extraction and an ensemble of knn classifiers. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE, pp 589–593
Ram CS, Ponnusamy R (2014) An effective automatic speech emotion recognition for tamil language using support vector machine. In: International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT). IEEE, pp 19–23
de Lope J, Graña M (2023) An ongoing review of speech emotion recognition. Neurocomputing 528:1–11 Article Google Scholar
Arango-Sánchez JA, Arias-Londoño JD (2022) An enhanced conv-tasnet model for speech separation using a speaker distance-based loss function. arXiv preprint arXiv:2205.13657
Basir S, Hossain MN, Hosen MS, Ali MS, Riaz Z, Islam MS (2024) U-net: a supervised approach for monaural source separation. Arabian J Sci Eng 1–13
Opochinsky R, Moradi M, Gannot S (2024) Single-microphone speaker separation and voice activity detection in noisy and reverberant environments. arXiv preprint arXiv:2401.03448
Zhang X, Tang J, Cao H, Wang C, Shen C, Liu J (2024) Cascaded speech separation denoising and dereverberation using attention and tcn-wpe networks for speech devices. IEEE Internet Things J 11(10):18047–18058 Article Google Scholar
Wichern G, Antognini J, Flynn M, Zhu LR, McQuinn E, Crow D, Manilow E, Roux JL (2019) Wham!: extending speech separation to noisy environments. arXiv preprint arXiv:1907.01160
Lutati S, Nachmani E, Wolf L (2022) Sepit approaching a single channel speech separation bound. arXiv preprint arXiv:2205.11801
Subakan C, Ravanelli M, Cornell S, Grondin F (2022) Real-m: towards speech separation on real mixtures. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6862–6866
Dubey H, Gopal V, Cutler R, Aazami A, Matusevych S, Braun S, Eskimez SE, Thakker M, Yoshioka T, Gamper H et al. (2022) Icassp 2022 deep noise suppression challenge. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 9271–9275
Veaux C, Yamagishi J, MacDonald K et al. (2017) Cstr vctk corpus: english multi-speaker corpus for cstr voice cloning toolkit. University of Edinburgh.The Centre for Speech Technology Research (CSTR) Google Scholar
Gokilavani M, Katakam H, Basheer SA, Srinivas PVVS (2022) Ravdness, crema-d, tess based algorithm for emotion recognition using speech. In: 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp 1625–1631
Cai Y, Xingguang L, Jinsong L (2023) Emotion recognition using different sensors, emotion models, methods and datasets: a comprehensive review. Sensors 23(5):2455 Article Google Scholar
Nam Y, Lee C (2021) Cascaded convolutional neural network architecture for speech emotion recognition in noisy conditions. Sensors 21(13):4399 Article Google Scholar
Laugs C, Koops HV, Odijk D, Kaya H, Volk A (2020) The influence of blind source separation on mixed audio speech and music emotion recognition. In: Companion Publication of the International Conference on Multimodal Interaction, pp 67–71
Jieping X, Xirong L, Hao Y, Yang G (2014) Source separation improves music emotion recognition. In: International Conference on Multimedia Retrieval, pp 423, –426
Cámbara G, Luque J, Farrús M (2020) Convolutional speech recognition with pitch and voice quality features. arXiv preprint arXiv:2009.01309
Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomed Signal Process Control 47:312–323 Article Google Scholar
Sadia Sultana MZI, Selim MR, Rashid MM, Rahman MS (2021) Bangla speech emotion recognition and cross-lingual study using deep cnn and blstm networks. IEEE Access. 10:564–578 Article Google Scholar
Stolar MN, Lech M, Bolia RS, Skinner M (2017) Real time speech emotion recognition using rgb image classification and transfer learning. In: 11th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE, pp 1–8
Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: International Conference on Platform Technology and Service (PlatCon). IEEE, pp 1–5
Huang A, Bao P (2019) Human vocal sentiment analysis. arXiv preprint arXiv:1905.08632
Chernykh V, Prikhodko P (2017) Emotion recognition from speech with recurrent neural networks. arXiv preprint arXiv:1701.08071
Hong-Wei N, Nguyen VD, Vonikakis V, Winkler S (2015) Deep learning for emotion recognition on small datasets using transfer learning. In: ACM International Conference on Multimodal Interaction, pp 443–449
Nagase R, Fukumori T, Yamashita Y (2022) Speech emotion recognition using label smoothing based on neutral and anger characteristics. In: 4th Global Conference on Life Sciences and Technologies (LifeTech). IEEE, pp 626–627
Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, Polat K (2022) Emotional speaker identification using a novel capsule nets model. Expert Syst With Appl 193(116469)
Tripathi S, Kumar A, Ramesh A, Singh C, Yenigalla P (2019) Focal loss based residual convolutional neural network for speech emotion recognition. arXiv preprint arXiv:1906.05682
Xiaoke L, Zhang Z, Gan C, Xiang Y (2022) Multi-label speech emotion recognition via inter-class difference loss under response residual network. In: IEEE Transactions on Multimedia
Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13(5):e0196391 Article Google Scholar
Dupuis K, Pichora-Fuller MK (2010) Toronto emotional speech set (tess)-younger talker_happy
Jackson P, Haq S. Surrey audio-visual expressed emotion (savee) database. 2014) University of Surrey. Guildford, UK
Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, Verma R (2014) Crema-d: crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput 5(4):377–390 Article Google Scholar
Ancilin J, Milton A (2021) Improved speech emotion recognition with mel frequency magnitude coefficient. Appl Acoust 179(108046)
Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: 22nd international conference on data engineering workshops (ICDEW). IEEE, pp 8–8
Sun L, Sheng F, Wang F (2019) Decision tree svm model with fisher feature selection for speech emotion recognition. EURASIP J Audio Speech Music process 2019(1):1–14 Article Google Scholar
Kerkeni L, Serrestou Y, Raoof K, Mbarki M, Mahjoub MA, Cleder C (2019) Automatic speech emotion recognition using an optimal combination of features based on emd-tkeo. Speech Commun 114:22–35 Article Google Scholar
Luo Y, Mesgarani N (2018) Tasnet: time-domain audio separation network for real-time, single-channel speech separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 696–700
Cosentino J, Pariente M, Cornell S, Deleforge A, Vincent E (2020) Librimix: an open-source dataset for generalizable speech separation. arXiv preprint arXiv:2005.11262
Lech M, Stolar M, Best C, Bolia R (2020) Real-time speech emotion recognition using a pre-trained image classification network: effects of bandwidth reduction and companding. Front Comput Sci 2(14)
Ravanelli M, Parcollet T, Plantinga P, Rouhe A, Cornell S, Lugosch L, Subakan C, Dawalatabad N, Heba A, Zhong J et al. (2021) Speechbrain: a general-purpose speech toolkit. arXiv preprint arXiv:2106.04624
Pariente M, Cornell S, Cosentino J, Sivasankaran S, Tzinis E, Heitkaemper J, Olvera M, Stöter F-R, Mathieu H, Martín-Doñas JM et al. (2020) Asteroid: the pytorch-based audio source separation toolkit for researchers. arXiv preprint arXiv:2005.04132
Yuan S, Cheng P, Zhang R, Hao W, Gan Z, Carin L (2021) Improving zero-shot voice style transfer via disentangled representation learning. arXiv preprint arXiv:2103.09420
Basak S, Agarwal S, Ganapathy S, Takahashi N (2021) End-to-end lyrics recognition with voice to singing style transfer. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 266–270
Subakan C, Ravanelli M, Cornell S, Grondin F, Bronzi M (2023) Exploring self-attention mechanisms for speech separation. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: 26th annual international conference on machine learning, pp 41–48
Guevara-Rukoz A, Demirsahin I, Fei H, Chu S-HC, Sarin S, Pipatsrisawat K, Gutkin A, Butryna A, Kjartansson O (2020) Crowdsourcing latin American Spanish for low-resource text-to-speech. In: Twelfth Language Resources and Evaluation Conference, pp 6504–6513
Cárdenas P, García J, Begazo R, Aguilera A, Dongo I, Cardinale Y (2024) Evaluation of robot emotion expressions for human–robot interaction. Int J Soc robot 16(9):2019–2041 Article Google Scholar
Radford A, Kim JW, Xu T, Brockman G, McLeavey C, Sutskever I (2023) Robust speech recognition via large-scale weak supervision. Int Conf on Mach Learn 28492–28518. PMLR