Understanding HMM training for video gesture recognition (original) (raw)

When developing a video gesture recognition system to recognise letters of the alphabet based on hidden Markov Model (HMM) pattern recognition, we observed that by carefully selecting the model structure we could obtain greatly improved recognition performance. This led us to the questions: Why do some HMMs work so well for pattern recognition? Which factors affect the HMM training process? In an attempt to answer these fundamental questions of learning, we used simple triangle and square video gestures where good HMM structure can be deduced analytically from knowledge of the physical process. We then compared these analytic models to models estimated from Baum-Welch training on the video gestures. This paper shows that with appropriate constraints on model structure, Baum-Welch reestimation leads to good HMMs which are very similar to those obtained analytically. These results corroborate earlier work where we show that the LR banded HMM structure is remarkably effective in recognising video gestures when compared to fully-connected (ergodic) or LR HMM structures.