A simplified Subspace Gaussian Mixture to compact acoustic models for speech recognition (original) (raw)

Speech recognition applications are known to require a significant amount of resources (memory, computing power). However, embedded speech recognition systems, such as in mobile phones, only authorizes few KB of memory and few MIPS. In the context of HMM-based speech recognizers, each HMMstate distribution is modeled independently from to the other and has a large amount of parameters. In spite of using statetying techniques, the size of the acoustic models stays large and certain redundancy remains between states. In this paper, we investigate the capacity of the Subspace Gaussian Mixture approach to reduce the acoustic models size while keeping good performances. We introduce a simplification concerning state specific Gaussians weights estimation, which is a very complex and time consuming procedure in the original approach. With this approach, we show that the acoustic model size can be reduced by 92% with almost the same performance as the standard acoustic modeling.