Data Augmentation for Internet of Things Dialog System (original) (raw)
Abstract
With rapid development of voice control technology, making speech recognition more precisely in various IoT domains have been an intractable problem to be solved. Since there are various conversation scenes, understanding the context of a dialog scene is a key issue of voice control systems. However, the reality is available training data for dialog system are always insufficient. In this paper, we mainly solve the problem of data lacking in dialog systems by data augmentation technique. A Generative Adversarial Network(GAN)-based model is proposed and the data are augmented effectively. It can generate from text to text, enhance the original data with text retelling, and improve the robustness of parameter estimation of unknown data by using the sample data generated by GAN model. A new N-gram language model is used to evaluate multiple recognition candidates of speech recognition, and the candidate sentences with the highest evaluation scores are selected as the final result of speech recognition. Our data enhancement algorithm based on the Generative Model is verified by the experiments. In the result of model comparison test, the error rates of data set THCHS30 and AISHELL are 3.3% and 5.1% which are lower than that of the baseline system.
Access this article
Subscribe and save
- Starting from 10 chapters or articles per month
- Access and download chapters and articles from more than 300k books and 2,500 journals
- Cancel anytime View plans
Buy Now
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Similar content being viewed by others
References
- Hirsimaki T, Pylkkonen J, Kurimo M (2009) Importance of High-Order N-Gram Models in Morph-Based Speech Recognition[J]. IEEE Trans Audio Speech Lang Process 17(4):724–732
Article Google Scholar - Siivola V, Hirsimäki T et al (2007) On growing and pruning Kneser-Ney smoothed N-gram models[J]. IEEE Trans Audio Speech Lang Process 15(5):1617–1624
Article Google Scholar - Cohen L, Krustedt RL, May M (2009) Fluency, Text Structure, and Retelling: A Complex Relationship[J]. Read Horiz 49:101–124
Google Scholar - Kucer SB (2011) Going beyond the author: what retellings tell us about comprehending narrative and expository texts[J]. Literacy 45(2):62–69
Article Google Scholar - Cui X, Goel V, Kingsbury B (2015) Data augmentation for deep neural network acoustic modeling[J]. IEEE/ACM Trans Audio Speech Lang Process 23(9):1469–1477
Article Google Scholar - Naredo E, Urbano P, Trujillo L (2016) The training set and generalization in grammatical evolution for autonomous agent navigation[J]. Soft Comput 21(15):1–18
Google Scholar - Chang WD (2014) Recurrent neural network modeling combined with bilinear model structure[J]. Neural Comput & Applic 24(3–4):765–773
Article Google Scholar - Wang J, Jie Z, Wang X, Bilateral LSTM (2018) A Two-Dimensional Long Short-Term Memory Model With Multiply Memory Units for Short-Term Cycle Time Forecasting in Re-entrant Manufacturing Systems[J]. IEEE Trans Ind Inf 14(2):748–758
Article Google Scholar - Palangi H, Li D, Shen Y et al (2016) Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval[J]. IEEE/ACM Trans Audio Speech Lang Process 24(4):1–1
Article Google Scholar - Blatz J, Fitzgerald E, Foster G et al (2008) Confidence Estimation for Machine Translations[J]. Proc Coling 33(1):9–40
Google Scholar - Nothman J, Ringland N, Radford W et al (2013) Learning multilingual named entity recognition from Wikipedia[J]. Artif Intell 194:151–175
Article MathSciNet MATH Google Scholar - Kukich K (1983) Design of a knowledge-based report generator. In Proceedings of the 21st annual meeting on Association for Computational Linguistics (ACL ‘83). Association for Computational Linguistics, USA, 145–150
- Xu L, Jiang L, Qin C, et al. (2018) How images inspire poems: generating classical Chinese poetry from images with memory networks. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI). pp. 5618–5625.
- Otte S, Butz MV, Koryakin D et al (2016) Optimizing recurrent reservoirs with neuro-evolution[J]. Neurocomputing 192:128–138
Article Google Scholar - Shao Y, Hardmeier C, Tiedemann J, et al. (2017) Character-based Joint Segmentation and POS Tagging for Chinese using Bidirectional RNN-CRF. The 8th International Joint Conference on Natural Language Processing (IJCNLP).pp. 59–69.
- Mei H, Bansal M, Walter MR (2015) What to talk about and how? Selective generation using LSTMs with coarse-to-fine alignment[J]. Computerence 1:720–730
Google Scholar - Biesmans W, Das N, Francart T et al (2017) Auditory-Inspired Speech Envelope Extraction Methods for Improved EEG-Based Auditory Attention Detection in a Cocktail Party Scenario[J]. IEEE Trans Neural Syst Rehab Eng 25(5):402–412
Article Google Scholar - Lebret R et al. (2016) “Generating text from structured data with application to the biography domain.”ArXiv abs/1603.07771
- Bengio Y, Ducharme R, Vincent P et al (2003) A neural probabilistic language model[J]. J Mach Learn Res 3(2):1137–1155
MATH Google Scholar - Yu L, Zhang W, Wang J, Yu Y (2017) Seqgan: sequence generative adversarial nets with policy gradient. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), pp. 2852–2858.
- Zhu J, Park T, Isola P, Efros AA (2017) "Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks," 2017 IEEE International Conference on Computer Vision (ICCV), Venice, pp. 2242–2251.
- Wei J, Zou K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp.6382–6388.
- Xie Q, Dai Z, Hovy E, Luong M-T, Le QV (2019) Unsupervised Data Augmentation for Consistency Training, arXiv:1904.12848v4
- Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel C (2019) MixMatch: A Holistic Approach to Semi-Supervised Learning, 33rd Conference on Neural Information Processing Systems (NeurIPS), Vancouver, Canada
- Huang S-W, Lin C-T, Chen S-P, Wu Y-Y, Lai S-H (2018) AugGAN: Cross Domain Adaptation with GAN-based Data Augmentation. ECCV 2018: European Conference on Computer Vision, pp.718–731.
- Hu Z et al. (2019) Learning Data Manipulation for Augmentation and Weighting. In: Advances in Neural Information Processing Systems, pp. 15738–15749.
- Li Y et al (2018) A generative model for category text generation. Inf Sci 450:301–315
Article MathSciNet Google Scholar - Shakeel MH, Karim A, Khan I (2020) A multi-cascaded model with data augmentation for enhanced paraphrase detection in short texts. Inf Process Manag 57(3):78–88
Article Google Scholar - Ling ZH, Ai Y, Gu Y et al (2018) Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension[J]. IEEE/ACM Trans Audio Speech Lang Process 26(5):883–894
Article Google Scholar
Acknowledgements
This work was partially supported by the National Funding from the FCT - Fundação para a Ciência e a Tecnologia through the UID/EEA/50008/2019 Project; and by Brazilian National Council for Scientific and Technological Development (CNPq) via Grant No. 309335/2017-5.
Author information
Authors and Affiliations
- Harbin Institute of Technology, Shenzhen, Harbin, China
Eric Ke Wang & Juntao Yu - College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, Shandong, China
Chien-Ming Chen - Department of Mathematics, Chaudhary Charan Singh University, Meerut, India
Saru Kumari - Federal University of Piauí, Teresina, PI, 64049-550, Brazil
Joel J. P. C. Rodrigues - Instituto de Telecomunicações, Aveiro, Portugal
Joel J. P. C. Rodrigues
Authors
- Eric Ke Wang
- Juntao Yu
- Chien-Ming Chen
- Saru Kumari
- Joel J. P. C. Rodrigues
Corresponding author
Correspondence toSaru Kumari.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, E.K., Yu, J., Chen, CM. et al. Data Augmentation for Internet of Things Dialog System.Mobile Netw Appl 27, 158–171 (2022). https://doi.org/10.1007/s11036-020-01638-9
- Published: 04 September 2020
- Version of record: 04 September 2020
- Issue date: February 2022
- DOI: https://doi.org/10.1007/s11036-020-01638-9