Emotion Recognition in Multi-speaker Scenarios for Social Robots (original) (raw)

References

  1. Nocentini O, Fiorini L, Acerbi G, Sorrentino A, Mancioppi G, Cavallo F (2019) A survey of behavioral models for social robots. Robotics 3:54
    Article Google Scholar
  2. Martinez-Martin E, Del Pobil AP (2018) Personal robot assistants for elderly care: an overview. Pers Assistants: Emerging Comput Technol 77–91
  3. Asgharian P, Panchea AM, Ferland F (2022) A review on the use of mobile service robots in elderly care. Robotics 11(6):127
    Article Google Scholar
  4. Toh LPE, Causo A, pei-Wen Tzuo I-MC, and Yeo SH (2016 2) A review on the use of robots in education and young children. Br J Educ Technol & Soc 19:148–163
    Google Scholar
  5. Chen H, Park HW, Breazeal C (2020) Teaching and learning with children: impact of reciprocal peer learning with a social robot on children’s learning and emotive engagement. Comput & Educ 150(103836)
  6. Sasaki Y, Nitta J (2017) Long-term demonstration experiment of autonomous mobile robot in a science museum. In International Symposium on Robotics and Intelligent Sensors, IEEE, pp 304–310
  7. Hellou M, Lim J, Gasteiger N, Jang M, Ahn HS (2022) Technical methods for social robots in museum settings: an overview of the literature. Int J Soc robot 14(8):1767–1786
    Article Google Scholar
  8. Clark HH, Fischer K (2023) Social robots as depictions of social agents. Behav Brain Sci 46(e21)
  9. Doğan S, Colak A (2024) Social robots in the instruction of social skills in autism: a comprehensive descriptive analysis of single-case experimental designs. Disabil Rehabil Assist Technol 19(2):325–344
    Article Google Scholar
  10. Laban G, Morrison V, Cross ES (2024) Social robots for health psychology: a new frontier for improving human health and well-being. Eur Health psychologist 23(1):1095–1102
    Google Scholar
  11. Khan A, Anwar Y (2019) Robots in healthcare: a survey. In Science and Information Conference, Springer, Berlin, Germany, 280–292
  12. Gulnaz Nasir Peerzade RD, Waghmare SD (2018) A review: speech emotion recognition. Int. J. Comput. Sci. Eng 6(3):400–402
    Google Scholar
  13. Wang D, Chen J (2018) Supervised speech separation based on deep learning: an overview. IEEE/ACM Trans Audio, Speech, Language process 26(10):1702–1726
    Article MathSciNet Google Scholar
  14. Soni S, Yadav RN, Gupta L (2023) State-of-the-art analysis of deep learning-based monaural speech source separation techniques. IEEE Access. 11:4242–4269
    Article Google Scholar
  15. Ravenscroft W, Goetze S, Hain T (2022) Deformable temporal convolutional networks for monaural noisy reverberant speech separation. arXiv preprint arXiv:2210.15305
  16. Nachmani E, Adi Y, (2020) Wolf L Voice separation with an unknown number of multiple speakers. In International Conference on Machine Learning, PMLR, pp 7164–7175
  17. Sakthi S (2022) Dnn based speech quality enhancement and multi-speaker separation for automatic speech recognition system. Mach Learn Algoritms For Signal And Image processing 231–246
  18. Jin R, Ablimit M, Hamdulla A (2022) Speech separation and emotion recognition for multi-speaker scenarios. In: 3rd International Conference on Pattern Recognition and Machine Learning (PRML). IEEE, pp 280–284
  19. Zhang Z, Yoshioka T, Kanda N, Chen Z, Wang X, Wang D, Eskimez SE (2022) All-neural beamformer for continuous speech separation. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 6032–6036
  20. Chen S, Wu Y, Chen Z, Yoshioka T, Liu S, Jinyu L, Xiangzhan Y (2021) Don’t shoot butterfly with rifles: multi-channel continuous speech separation with early exit transformer. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6139–6143
  21. Wang X, Wang D, Kanda N, Eskimez SE, Yoshioka T (2022) Leveraging real conversational data for multi-channel continuous speech separation. arXiv preprint arXiv:2204.03232
  22. Subakan C, Ravanelli M, Cornell S, Bronzi M, Zhong J (2021) Attention is all you need in speech separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 21–25
  23. Rieger SA, Muraleedharan R, Ramachandran RP (2014) Speech based emotion recognition using spectral feature extraction and an ensemble of knn classifiers. In: The 9th International Symposium on Chinese Spoken Language Processing. IEEE, pp 589–593
  24. Ram CS, Ponnusamy R (2014) An effective automatic speech emotion recognition for tamil language using support vector machine. In: International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT). IEEE, pp 19–23
  25. de Lope J, Graña M (2023) An ongoing review of speech emotion recognition. Neurocomputing 528:1–11
    Article Google Scholar
  26. Arango-Sánchez JA, Arias-Londoño JD (2022) An enhanced conv-tasnet model for speech separation using a speaker distance-based loss function. arXiv preprint arXiv:2205.13657
  27. Basir S, Hossain MN, Hosen MS, Ali MS, Riaz Z, Islam MS (2024) U-net: a supervised approach for monaural source separation. Arabian J Sci Eng 1–13
  28. Opochinsky R, Moradi M, Gannot S (2024) Single-microphone speaker separation and voice activity detection in noisy and reverberant environments. arXiv preprint arXiv:2401.03448
  29. Zhang X, Tang J, Cao H, Wang C, Shen C, Liu J (2024) Cascaded speech separation denoising and dereverberation using attention and tcn-wpe networks for speech devices. IEEE Internet Things J 11(10):18047–18058
    Article Google Scholar
  30. Wichern G, Antognini J, Flynn M, Zhu LR, McQuinn E, Crow D, Manilow E, Roux JL (2019) Wham!: extending speech separation to noisy environments. arXiv preprint arXiv:1907.01160
  31. Lutati S, Nachmani E, Wolf L (2022) Sepit approaching a single channel speech separation bound. arXiv preprint arXiv:2205.11801
  32. Subakan C, Ravanelli M, Cornell S, Grondin F (2022) Real-m: towards speech separation on real mixtures. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 6862–6866
  33. Dubey H, Gopal V, Cutler R, Aazami A, Matusevych S, Braun S, Eskimez SE, Thakker M, Yoshioka T, Gamper H et al. (2022) Icassp 2022 deep noise suppression challenge. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, pp 9271–9275
  34. Veaux C, Yamagishi J, MacDonald K et al. (2017) Cstr vctk corpus: english multi-speaker corpus for cstr voice cloning toolkit. University of Edinburgh.The Centre for Speech Technology Research (CSTR)
    Google Scholar
  35. Gokilavani M, Katakam H, Basheer SA, Srinivas PVVS (2022) Ravdness, crema-d, tess based algorithm for emotion recognition using speech. In: 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), pp 1625–1631
  36. Zhu-Zhou F, Gil-Pita R, García-Gómez J, Rosa-Zurera M (2022) Robust multi-scenario speech-based emotion recognition system. Sensors 22(6)
  37. Cai Y, Xingguang L, Jinsong L (2023) Emotion recognition using different sensors, emotion models, methods and datasets: a comprehensive review. Sensors 23(5):2455
    Article Google Scholar
  38. Nam Y, Lee C (2021) Cascaded convolutional neural network architecture for speech emotion recognition in noisy conditions. Sensors 21(13):4399
    Article Google Scholar
  39. Laugs C, Koops HV, Odijk D, Kaya H, Volk A (2020) The influence of blind source separation on mixed audio speech and music emotion recognition. In: Companion Publication of the International Conference on Multimodal Interaction, pp 67–71
  40. Jieping X, Xirong L, Hao Y, Yang G (2014) Source separation improves music emotion recognition. In: International Conference on Multimedia Retrieval, pp 423, –426
  41. Cámbara G, Luque J, Farrús M (2020) Convolutional speech recognition with pitch and voice quality features. arXiv preprint arXiv:2009.01309
  42. Zhao J, Mao X, Chen L (2019) Speech emotion recognition using deep 1d & 2d cnn lstm networks. Biomed Signal Process Control 47:312–323
    Article Google Scholar
  43. Sadia Sultana MZI, Selim MR, Rashid MM, Rahman MS (2021) Bangla speech emotion recognition and cross-lingual study using deep cnn and blstm networks. IEEE Access. 10:564–578
    Article Google Scholar
  44. Stolar MN, Lech M, Bolia RS, Skinner M (2017) Real time speech emotion recognition using rgb image classification and transfer learning. In: 11th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE, pp 1–8
  45. Badshah AM, Ahmad J, Rahim N, Baik SW (2017) Speech emotion recognition from spectrograms with deep convolutional neural network. In: International Conference on Platform Technology and Service (PlatCon). IEEE, pp 1–5
  46. Huang A, Bao P (2019) Human vocal sentiment analysis. arXiv preprint arXiv:1905.08632
  47. Chernykh V, Prikhodko P (2017) Emotion recognition from speech with recurrent neural networks. arXiv preprint arXiv:1701.08071
  48. Hong-Wei N, Nguyen VD, Vonikakis V, Winkler S (2015) Deep learning for emotion recognition on small datasets using transfer learning. In: ACM International Conference on Multimodal Interaction, pp 443–449
  49. Nagase R, Fukumori T, Yamashita Y (2022) Speech emotion recognition using label smoothing based on neutral and anger characteristics. In: 4th Global Conference on Life Sciences and Technologies (LifeTech). IEEE, pp 626–627
  50. Nassif AB, Shahin I, Elnagar A, Velayudhan D, Alhudhaif A, Polat K (2022) Emotional speaker identification using a novel capsule nets model. Expert Syst With Appl 193(116469)
  51. Tripathi S, Kumar A, Ramesh A, Singh C, Yenigalla P (2019) Focal loss based residual convolutional neural network for speech emotion recognition. arXiv preprint arXiv:1906.05682
  52. Xiaoke L, Zhang Z, Gan C, Xiang Y (2022) Multi-label speech emotion recognition via inter-class difference loss under response residual network. In: IEEE Transactions on Multimedia
  53. Livingstone SR, Russo FA (2018) The ryerson audio-visual database of emotional speech and song (ravdess): a dynamic, multimodal set of facial and vocal expressions in north American English. PLoS One 13(5):e0196391
    Article Google Scholar
  54. Dupuis K, Pichora-Fuller MK (2010) Toronto emotional speech set (tess)-younger talker_happy
  55. Jackson P, Haq S. Surrey audio-visual expressed emotion (savee) database. 2014) University of Surrey. Guildford, UK
  56. Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, Verma R (2014) Crema-d: crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput 5(4):377–390
    Article Google Scholar
  57. Ancilin J, Milton A (2021) Improved speech emotion recognition with mel frequency magnitude coefficient. Appl Acoust 179(108046)
  58. Martin O, Kotsia I, Macq B, Pitas I (2006) The enterface’05 audio-visual emotion database. In: 22nd international conference on data engineering workshops (ICDEW). IEEE, pp 8–8
  59. Sun L, Sheng F, Wang F (2019) Decision tree svm model with fisher feature selection for speech emotion recognition. EURASIP J Audio Speech Music process 2019(1):1–14
    Article Google Scholar
  60. Kerkeni L, Serrestou Y, Raoof K, Mbarki M, Mahjoub MA, Cleder C (2019) Automatic speech emotion recognition using an optimal combination of features based on emd-tkeo. Speech Commun 114:22–35
    Article Google Scholar
  61. Luo Y, Mesgarani N (2018) Tasnet: time-domain audio separation network for real-time, single-channel speech separation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 696–700
  62. Cosentino J, Pariente M, Cornell S, Deleforge A, Vincent E (2020) Librimix: an open-source dataset for generalizable speech separation. arXiv preprint arXiv:2005.11262
  63. Lech M, Stolar M, Best C, Bolia R (2020) Real-time speech emotion recognition using a pre-trained image classification network: effects of bandwidth reduction and companding. Front Comput Sci 2(14)
  64. Ravanelli M, Parcollet T, Plantinga P, Rouhe A, Cornell S, Lugosch L, Subakan C, Dawalatabad N, Heba A, Zhong J et al. (2021) Speechbrain: a general-purpose speech toolkit. arXiv preprint arXiv:2106.04624
  65. Pariente M, Cornell S, Cosentino J, Sivasankaran S, Tzinis E, Heitkaemper J, Olvera M, Stöter F-R, Mathieu H, Martín-Doñas JM et al. (2020) Asteroid: the pytorch-based audio source separation toolkit for researchers. arXiv preprint arXiv:2005.04132
  66. Yuan S, Cheng P, Zhang R, Hao W, Gan Z, Carin L (2021) Improving zero-shot voice style transfer via disentangled representation learning. arXiv preprint arXiv:2103.09420
  67. Basak S, Agarwal S, Ganapathy S, Takahashi N (2021) End-to-end lyrics recognition with voice to singing style transfer. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 266–270
  68. Subakan C, Ravanelli M, Cornell S, Grondin F, Bronzi M (2023) Exploring self-attention mechanisms for speech separation. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing
  69. Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: 26th annual international conference on machine learning, pp 41–48
  70. Guevara-Rukoz A, Demirsahin I, Fei H, Chu S-HC, Sarin S, Pipatsrisawat K, Gutkin A, Butryna A, Kjartansson O (2020) Crowdsourcing latin American Spanish for low-resource text-to-speech. In: Twelfth Language Resources and Evaluation Conference, pp 6504–6513
  71. Cárdenas P, García J, Begazo R, Aguilera A, Dongo I, Cardinale Y (2024) Evaluation of robot emotion expressions for human–robot interaction. Int J Soc robot 16(9):2019–2041
    Article Google Scholar
  72. Radford A, Kim JW, Xu T, Brockman G, McLeavey C, Sutskever I (2023) Robust speech recognition via large-scale weak supervision. Int Conf on Mach Learn 28492–28518. PMLR

Download references