Speech Recognition Using RNN, DNN and Web Services (original) (raw)

Abstract

Human-Computer Interactions (HCI) are extremely important in today’s fast developing AI environment. The process of voice recognition involves taking the most important data from the incoming speech signal and using that information to determine accurately what text goes where. As Alexa and Siri only translate foreign languages, our goal is to convert the speech signal for eight Indian languages into text. User speaks through microphone during speech recognition system. In order to discriminate between various words and sounds, characteristics are taken from the spectrogram. Mel Frequency Cepstral Coefficients are the procedure utilised for Feature Extraction (MFCC). A probability over alphabetic characters is then generated by the Acoustic Model using the retrieved features as input. The collected features are mapped to words and sounds using DNN and RNN algorithms in this step. In order for the algorithms to learn the connections between the features and the sounds they represent, a huge corpus of speech data is used for training. By supplying data on word relationships within a language and the statistical likelihood of word sequences, language models play a key role in speech recognition systems. The likelihood of a word sequence is here estimated using probabilistic models using the audio that has been transcribed. It is possible to develop language models using statistical methods, such as n-gram language models and recurrent neural network language models (RNN-LMs). A suitable class label is applied to a pattern based on an abstraction created using a collection of training patterns or domain expertise, and the pattern is then converted. We provide results on the recognition of non-native speech using multilingual Hidden Markov Models. The system’s effectiveness and optimisation are based on how long it takes to translate into the target language. Finally, the services are available in the web so that anyone can access and perform Speech-to-text conversion.

Similar content being viewed by others

References

  1. Shivakumar, K.M., Aravind, K.G., Anoop, T.V., Gupta, D.: Kannada speech to text conversion using CMU sphinx. In: International Conference on Inventive Computation Technologies, pp. 1–6 (2016)
    Google Scholar
  2. Lero, R.D., Exton, C., Le Gear, A.: Communications using a speech-to-text-to-speech pipeline. In: International Conference on Wireless and Mobile Computing, Networking and Communications, pp. 1–6 (2019)
    Google Scholar
  3. Panda, S.P.: Automated speech recognition system in advancement of human-computer interaction. In: Proceedings of the IEEE International Conference on Computing Methodologies and Communication, pp. 302–306 (2017)
    Google Scholar
  4. Sultana, R., Palit, R.: A Survey on Bengali Speech–to–text recognition techniques. In: 9th International Forum on Strategic Technology, pp. 26–29 (2014)
    Google Scholar
  5. Lakkhanawannakun, P., Noyunsan, C.: Speech recognition using deep learning. In: International Technical Conference on Circuits/Systems, Computers and Communications, pp. 1–4 (2019)
    Google Scholar
  6. Mukherjee, P., Santra, S., Bhowmick, S., Paul, A., Chatterjee, P., Dayasi, A.: Development of GUI for Text-to-speech recognition using natural language processing. In: 2nd IEEE International Conference on Electronics, Materials Engineering and Nano Technology, pp. 1–4 (2018)
    Google Scholar
  7. Fischer, V., Janke, E., Kunzmann, S., Ross, T.: Multilingual acoustic models for the recognition of non-native speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 331–334 (2001)
    Google Scholar
  8. Raju, V.V.V., Gangamohan, P., Gangashetty S.V., Vuppala, A.K.: Application of prosody modification for Speech Recognition in different Emotion conditions. In: IEEE Region 10 Conference, pp. 951–954 (2016)
    Google Scholar
  9. Anto, A., Nisha, K.K.: Text to speech synthesis system for English to Malayalam translation. In: International Conference on Emerging Technological Trends, pp. 1–6 (2016)
    Google Scholar
  10. Shen, Y.-L., Huang, S.-S.: Reinforcement learning based speech enhancement for robust speech recognition. In: IEEE International Conference on Acoustics, Speech & Signal Processing, pp. 6750–6754 (2019)
    Google Scholar
  11. Sahraeian, R., Van Compernolle, D.: Crosslingual and multilingual speech recognition based on the speech manifold. In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2301–2312 (2017)
    Google Scholar
  12. Wang, P.: Research and design of smart home speech recognition system based on deep learning. In: International Conference on Computer Vision, Image and Deep Learning, pp.218–221 (2020)
    Google Scholar
  13. Loh, C.Y., Boey, K.L., Hong, K.S.: Speech recognition interactive system for vehicle. In: IEEE 13th International Colloquium on Signal Process & its application, pp. 85–88, 2017
    Google Scholar
  14. Ling, Z.: An Acoustic Model For English Speech Recognition Based On Deep Learnig: In: 11th International Conference on Measuring Technology and Mechatronics Automation , pp. 610–614 (2019)
    Google Scholar
  15. Lin, X., Yang, J., Zhao, J.: The text analysis and processing of Thai language text to speech conversion system. In: 9th International Symposium on Chinese Spoken Language Processing, pp.436–436 (2014)
    Google Scholar
  16. Javkin, H., et al.: A multilingual speech-to- text system. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 242–245 (1989)
    Google Scholar
  17. Bansal, S., Agrawal, S.S.: Development of Text and Speech Corpus for Designing the Multilingual Recognition System, pp. 1–8 (2018)
    Google Scholar
  18. Ghadage, Y.H., Shelke, S.D.: Speech to text conversion for multilingual languages. In: Proceedings of the International Conference on Communication and Signal Processing, pp. 236–240 (2016)
    Google Scholar
  19. Rafieee, M.S., Jafari, S., Ahmadi, H.S., Jafari, M.: Considerations to spoken language recognition for text-to-speech applications. In: UKSim 13th International Conference on Computer Modelling and Simulation, pp. 304–309 (2011)
    Google Scholar
  20. Ananthakrishnan, S., Tsakalidis, S., Prasead, R., Natarajan, P., Namandi Vembu, A.: Automatic pronunciation prediction for speech to text synthesis of dialectal arabic in a speech-to-speech translation system. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4957–4960 (2012)
    Google Scholar

Download references

Author information

Authors and Affiliations

  1. Department of Computer Science and Engineering, St. Joseph’s Institue of Technology, Chennai, India
    Adlin Sheeba, G. S. Charulatha, S. Harini & C. A. Subasini
  2. Department of Information Technology, Dr. M.G.R. Educational and Research Institute, Chennai, India
    Dahlia Sam

Authors

  1. Adlin Sheeba
  2. G. S. Charulatha
  3. S. Harini
  4. C. A. Subasini
  5. Dahlia Sam

Corresponding author

Correspondence toAdlin Sheeba .

Editor information

Editors and Affiliations

  1. SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
    Annie Uthra R.
  2. Department of Computer Technology, Anna University, Chennai, Tamil Nadu, India
    Kottilingam Kottursamy
  3. Department of Computer Technology, Anna University, Chennai, Tamil Nadu, India
    Gunasekaran Raja
  4. Manchester Metropolitan University, Manchester, UK
    Ali Kashif Bashir
  5. Department of Computer Engineering, Süleyman Demirel University, Isparta, Türkiye
    Utku Kose
  6. SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
    Revathi Appavoo
  7. SRM Institute of Science and Technology, Chennai, Tamil Nadu, India
    Vimaladevi Madhivanan

Rights and permissions

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Cite this paper

Sheeba, A., Charulatha, G.S., Harini, S., Subasini, C.A., Sam, D. (2024). Speech Recognition Using RNN, DNN and Web Services. In: R., A.U., et al. Deep Sciences for Computing and Communications. IconDeepCom 2023. Communications in Computer and Information Science, vol 2176. Springer, Cham. https://doi.org/10.1007/978-3-031-68905-5\_33

Download citation

Keywords

Publish with us