Speech Recognition Using RNN, DNN and Web Services (original) (raw)

Abstract

Human-Computer Interactions (HCI) are extremely important in today’s fast developing AI environment. The process of voice recognition involves taking the most important data from the incoming speech signal and using that information to determine accurately what text goes where. As Alexa and Siri only translate foreign languages, our goal is to convert the speech signal for eight Indian languages into text. User speaks through microphone during speech recognition system. In order to discriminate between various words and sounds, characteristics are taken from the spectrogram. Mel Frequency Cepstral Coefficients are the procedure utilised for Feature Extraction (MFCC). A probability over alphabetic characters is then generated by the Acoustic Model using the retrieved features as input. The collected features are mapped to words and sounds using DNN and RNN algorithms in this step. In order for the algorithms to learn the connections between the features and the sounds they represent, a huge corpus of speech data is used for training. By supplying data on word relationships within a language and the statistical likelihood of word sequences, language models play a key role in speech recognition systems. The likelihood of a word sequence is here estimated using probabilistic models using the audio that has been transcribed. It is possible to develop language models using statistical methods, such as n-gram language models and recurrent neural network language models (RNN-LMs). A suitable class label is applied to a pattern based on an abstraction created using a collection of training patterns or domain expertise, and the pattern is then converted. We provide results on the recognition of non-native speech using multilingual Hidden Markov Models. The system’s effectiveness and optimisation are based on how long it takes to translate into the target language. Finally, the services are available in the web so that anyone can access and perform Speech-to-text conversion.

References

Shivakumar, K.M., Aravind, K.G., Anoop, T.V., Gupta, D.: Kannada speech to text conversion using CMU sphinx. In: International Conference on Inventive Computation Technologies, pp. 1–6 (2016)
Google Scholar
Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: International Conference on Wireless and Mobile Computing, Networking and Communications, pp. 1–6 (2019)
Google Scholar
Panda, S.P.: Automated speech recognition system in advancement of human-computer interaction. In: Proceedings of the IEEE International Conference on Computing Methodologies and Communication, pp. 302–306 (2017)
Google Scholar
Sultana, R., Palit, R.: A Survey on Bengali Speech–to–text recognition techniques. In: 9th International Forum on Strategic Technology, pp. 26–29 (2014)
Google Scholar
Lakkhanawannakun, P., Noyunsan, C.: Speech recognition using deep learning. In: International Technical Conference on Circuits/Systems, Computers and Communications, pp. 1–4 (2019)
Google Scholar
Mukherjee, P., Santra, S., Bhowmick, S., Paul, A., Chatterjee, P., Dayasi, A.: Development of GUI for Text-to-speech recognition using natural language processing. In: 2nd IEEE International Conference on Electronics, Materials Engineering and Nano Technology, pp. 1–4 (2018)
Google Scholar
Fischer, V., Janke, E., Kunzmann, S., Ross, T.: Multilingual acoustic models for the recognition of non-native speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 331–334 (2001)
Google Scholar
Raju, V.V.V., Gangamohan, P., Gangashetty S.V., Vuppala, A.K.: Application of prosody modification for Speech Recognition in different Emotion conditions. In: IEEE Region 10 Conference, pp. 951–954 (2016)
Google Scholar
Anto, A., Nisha, K.K.: Text to speech synthesis system for English to Malayalam translation. In: International Conference on Emerging Technological Trends, pp. 1–6 (2016)
Google Scholar
Shen, Y.-L., Huang, S.-S.: Reinforcement learning based speech enhancement for robust speech recognition. In: IEEE International Conference on Acoustics, Speech & Signal Processing, pp. 6750–6754 (2019)
Google Scholar
Sahraeian, R., Van Compernolle, D.: Crosslingual and multilingual speech recognition based on the speech manifold. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2301–2312 (2017)
Google Scholar
Wang, P.: Research and design of smart home speech recognition system based on deep learning. In: International Conference on Computer Vision, Image and Deep Learning, pp.218–221 (2020)
Google Scholar
Loh, C.Y., Boey, K.L., Hong, K.S.: Speech recognition interactive system for vehicle. In: IEEE 13th International Colloquium on Signal Process & its application, pp. 85–88, 2017
Google Scholar
Ling, Z.: An Acoustic Model For English Speech Recognition Based On Deep Learnig: In: 11th International Conference on Measuring Technology and Mechatronics Automation , pp. 610–614 (2019)
Google Scholar
Lin, X., Yang, J., Zhao, J.: The text analysis and processing of Thai language text to speech conversion system. In: 9th International Symposium on Chinese Spoken Language Processing, pp.436–436 (2014)
Google Scholar
Javkin, H., et al.: A multilingual speech-to- text system. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 242–245 (1989)
Google Scholar
Bansal, S., Agrawal, S.S.: Development of Text and Speech Corpus for Designing the Multilingual Recognition System, pp. 1–8 (2018)
Google Scholar
Ghadage, Y.H., Shelke, S.D.: Speech to text conversion for multilingual languages. In: Proceedings of the International Conference on Communication and Signal Processing, pp. 236–240 (2016)
Google Scholar
Rafieee, M.S., Jafari, S., Ahmadi, H.S., Jafari, M.: Considerations to spoken language recognition for text-to-speech applications. In: UKSim 13th International Conference on Computer Modelling and Simulation, pp. 304–309 (2011)
Google Scholar
Ananthakrishnan, S., Tsakalidis, S., Prasead, R., Natarajan, P., Namandi Vembu, A.: Automatic pronunciation prediction for speech to text synthesis of dialectal arabic in a speech-to-speech translation system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4957–4960 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, St. Joseph’s Institue of Technology, Chennai, India
Adlin Sheeba, G. S. Charulatha, S. Harini & C. A. Subasini
Department of Information Technology, Dr. M.G.R. Educational and Research Institute, Chennai, India
Dahlia Sam

Authors

Adlin Sheeba
G. S. Charulatha
S. Harini
C. A. Subasini
Dahlia Sam

Corresponding author

Correspondence toAdlin Sheeba .

Editor information

Editors and Affiliations

SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
Annie Uthra R.
Department of Computer Technology, Anna University, Chennai, Tamil Nadu, India
Kottilingam Kottursamy
Department of Computer Technology, Anna University, Chennai, Tamil Nadu, India
Gunasekaran Raja
Manchester Metropolitan University, Manchester, UK
Ali Kashif Bashir
Department of Computer Engineering, Süleyman Demirel University, Isparta, Türkiye
Utku Kose
SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
Revathi Appavoo
SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
Vimaladevi Madhivanan

Rights and permissions

Copyright information

About this paper

Cite this paper

Sheeba, A., Charulatha, G.S., Harini, S., Subasini, C.A., Sam, D. (2024). Speech Recognition Using RNN, DNN and Web Services. In: R., A.U., et al. Deep Sciences for Computing and Communications. IconDeepCom 2023. Communications in Computer and Information Science, vol 2176. Springer, Cham. https://doi.org/10.1007/978-3-031-68905-5\_33

Download citation

.RIS
.ENW
.BIB
DOI: https://doi.org/10.1007/978-3-031-68905-5\_33
Published: 29 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-68904-8
Online ISBN: 978-3-031-68905-5
eBook Packages: Computer Science Computer Science (R0)Springer Nature Proceedings excluding Computer Science