Speech Recognition Using RNN, DNN and Web Services (original) (raw)
Abstract
Human-Computer Interactions (HCI) are extremely important in today’s fast developing AI environment. The process of voice recognition involves taking the most important data from the incoming speech signal and using that information to determine accurately what text goes where. As Alexa and Siri only translate foreign languages, our goal is to convert the speech signal for eight Indian languages into text. User speaks through microphone during speech recognition system. In order to discriminate between various words and sounds, characteristics are taken from the spectrogram. Mel Frequency Cepstral Coefficients are the procedure utilised for Feature Extraction (MFCC). A probability over alphabetic characters is then generated by the Acoustic Model using the retrieved features as input. The collected features are mapped to words and sounds using DNN and RNN algorithms in this step. In order for the algorithms to learn the connections between the features and the sounds they represent, a huge corpus of speech data is used for training. By supplying data on word relationships within a language and the statistical likelihood of word sequences, language models play a key role in speech recognition systems. The likelihood of a word sequence is here estimated using probabilistic models using the audio that has been transcribed. It is possible to develop language models using statistical methods, such as n-gram language models and recurrent neural network language models (RNN-LMs). A suitable class label is applied to a pattern based on an abstraction created using a collection of training patterns or domain expertise, and the pattern is then converted. We provide results on the recognition of non-native speech using multilingual Hidden Markov Models. The system’s effectiveness and optimisation are based on how long it takes to translate into the target language. Finally, the services are available in the web so that anyone can access and perform Speech-to-text conversion.
Similar content being viewed by others
References
- Shivakumar, K.M., Aravind, K.G., Anoop, T.V., Gupta, D.: Kannada speech to text conversion using CMU sphinx. In: International Conference on Inventive Computation Technologies, pp. 1–6 (2016)
Google Scholar - Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: International Conference on Wireless and Mobile Computing, Networking and Communications, pp. 1–6 (2019)
Google Scholar - Panda, S.P.: Automated speech recognition system in advancement of human-computer interaction. In: Proceedings of the IEEE International Conference on Computing Methodologies and Communication, pp. 302–306 (2017)
Google Scholar - Sultana, R., Palit, R.: A Survey on Bengali Speech–to–text recognition techniques. In: 9th International Forum on Strategic Technology, pp. 26–29 (2014)
Google Scholar - Lakkhanawannakun, P., Noyunsan, C.: Speech recognition using deep learning. In: International Technical Conference on Circuits/Systems, Computers and Communications, pp. 1–4 (2019)
Google Scholar - Mukherjee, P., Santra, S., Bhowmick, S., Paul, A., Chatterjee, P., Dayasi, A.: Development of GUI for Text-to-speech recognition using natural language processing. In: 2nd IEEE International Conference on Electronics, Materials Engineering and Nano Technology, pp. 1–4 (2018)
Google Scholar - Fischer, V., Janke, E., Kunzmann, S., Ross, T.: Multilingual acoustic models for the recognition of non-native speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 331–334 (2001)
Google Scholar - Raju, V.V.V., Gangamohan, P., Gangashetty S.V., Vuppala, A.K.: Application of prosody modification for Speech Recognition in different Emotion conditions. In: IEEE Region 10 Conference, pp. 951–954 (2016)
Google Scholar - Anto, A., Nisha, K.K.: Text to speech synthesis system for English to Malayalam translation. In: International Conference on Emerging Technological Trends, pp. 1–6 (2016)
Google Scholar - Shen, Y.-L., Huang, S.-S.: Reinforcement learning based speech enhancement for robust speech recognition. In: IEEE International Conference on Acoustics, Speech & Signal Processing, pp. 6750–6754 (2019)
Google Scholar - Sahraeian, R., Van Compernolle, D.: Crosslingual and multilingual speech recognition based on the speech manifold. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2301–2312 (2017)
Google Scholar - Wang, P.: Research and design of smart home speech recognition system based on deep learning. In: International Conference on Computer Vision, Image and Deep Learning, pp.218–221 (2020)
Google Scholar - Loh, C.Y., Boey, K.L., Hong, K.S.: Speech recognition interactive system for vehicle. In: IEEE 13th International Colloquium on Signal Process & its application, pp. 85–88, 2017
Google Scholar - Ling, Z.: An Acoustic Model For English Speech Recognition Based On Deep Learnig: In: 11th International Conference on Measuring Technology and Mechatronics Automation , pp. 610–614 (2019)
Google Scholar - Lin, X., Yang, J., Zhao, J.: The text analysis and processing of Thai language text to speech conversion system. In: 9th International Symposium on Chinese Spoken Language Processing, pp.436–436 (2014)
Google Scholar - Javkin, H., et al.: A multilingual speech-to- text system. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 242–245 (1989)
Google Scholar - Bansal, S., Agrawal, S.S.: Development of Text and Speech Corpus for Designing the Multilingual Recognition System, pp. 1–8 (2018)
Google Scholar - Ghadage, Y.H., Shelke, S.D.: Speech to text conversion for multilingual languages. In: Proceedings of the International Conference on Communication and Signal Processing, pp. 236–240 (2016)
Google Scholar - Rafieee, M.S., Jafari, S., Ahmadi, H.S., Jafari, M.: Considerations to spoken language recognition for text-to-speech applications. In: UKSim 13th International Conference on Computer Modelling and Simulation, pp. 304–309 (2011)
Google Scholar - Ananthakrishnan, S., Tsakalidis, S., Prasead, R., Natarajan, P., Namandi Vembu, A.: Automatic pronunciation prediction for speech to text synthesis of dialectal arabic in a speech-to-speech translation system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4957–4960 (2012)
Google Scholar
Author information
Authors and Affiliations
- Department of Computer Science and Engineering, St. Joseph’s Institue of Technology, Chennai, India
Adlin Sheeba, G. S. Charulatha, S. Harini & C. A. Subasini - Department of Information Technology, Dr. M.G.R. Educational and Research Institute, Chennai, India
Dahlia Sam
Authors
- Adlin Sheeba
- G. S. Charulatha
- S. Harini
- C. A. Subasini
- Dahlia Sam
Corresponding author
Correspondence toAdlin Sheeba .
Editor information
Editors and Affiliations
- SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
Annie Uthra R. - Department of Computer Technology, Anna University, Chennai, Tamil Nadu, India
Kottilingam Kottursamy - Department of Computer Technology, Anna University, Chennai, Tamil Nadu, India
Gunasekaran Raja - Manchester Metropolitan University, Manchester, UK
Ali Kashif Bashir - Department of Computer Engineering, Süleyman Demirel University, Isparta, Türkiye
Utku Kose - SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
Revathi Appavoo - SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
Vimaladevi Madhivanan
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sheeba, A., Charulatha, G.S., Harini, S., Subasini, C.A., Sam, D. (2024). Speech Recognition Using RNN, DNN and Web Services. In: R., A.U., et al. Deep Sciences for Computing and Communications. IconDeepCom 2023. Communications in Computer and Information Science, vol 2176. Springer, Cham. https://doi.org/10.1007/978-3-031-68905-5\_33
Download citation
- .RIS
- .ENW
- .BIB
- DOI: https://doi.org/10.1007/978-3-031-68905-5\_33
- Published: 29 September 2024
- Publisher Name: Springer, Cham
- Print ISBN: 978-3-031-68904-8
- Online ISBN: 978-3-031-68905-5
- eBook Packages: Computer ScienceComputer Science (R0)Springer Nature Proceedings excluding Computer Science