Sandeep A - Academia.edu (original) (raw)

Related Authors

IJERT Journal

IRJET  Journal

Mohamad Ivan Fanany

Thanh Thi Nguyen

Hao Li

Hao Li

Swiss Federal Institute of Technology (ETH)

Henneck PĂ©rez

Paul Whelan

Uploads

Papers by Sandeep A

Research paper thumbnail of Speaker-Independent Speech Recognition using Visual Features

International Journal of Advanced Computer Science and Applications (IJACSA), 2020

Visual Speech Recognition aims at transcribing lip movements into readable text. There have been ... more Visual Speech Recognition aims at transcribing lip movements into readable text. There have been many strides in automatic speech recognition systems that can recognize words with audio and visual speech features, even under noisy conditions. This paper focuses only on the visual features, while a robust system uses visual features to support acoustic features. We propose the concatenation of visemes (lip movements) for text classification rather than a classic individual viseme mapping. The result shows that this approach achieves a significant improvement over the state-of-the-art models. The system has two modules; the first one extracts lip features from the input video, while the next is a neural network system trained to process the viseme sequence and classify it as text.

Research paper thumbnail of Speaker-Independent Speech Recognition using Visual Features

International Journal of Advanced Computer Science and Applications (IJACSA), 2020

Visual Speech Recognition aims at transcribing lip movements into readable text. There have been ... more Visual Speech Recognition aims at transcribing lip movements into readable text. There have been many strides in automatic speech recognition systems that can recognize words with audio and visual speech features, even under noisy conditions. This paper focuses only on the visual features, while a robust system uses visual features to support acoustic features. We propose the concatenation of visemes (lip movements) for text classification rather than a classic individual viseme mapping. The result shows that this approach achieves a significant improvement over the state-of-the-art models. The system has two modules; the first one extracts lip features from the input video, while the next is a neural network system trained to process the viseme sequence and classify it as text.

Log In