Optical Character Recognition for Hindi Language Using a Neural-network Approach (original) (raw)
2013, Journal of Information Processing Systems
Sign up to get access to over 50M papers
Related papers
Optical Character Recognition for Hindi Language Using Neural-network Approach
Journal of Information Processing Systems, 2013
"Hindi is the most spoken languages in India, with more than 300 million speaking it. As there is no separation between the characters of the text written in Hindi similar to texts written in English, the Optical Character Recognition (OCR) systems developed for Hindi language carries a very poor recognition rate. In this paper we propose an OCR for printed Hindi text in Devanagari script, using Artificial Neural Network (ANN) which improves the efficiency. One of the major reasons for the poor recognition rate is error in character segmentation. Presence of touching characters in the scanned documents further increase the segmentation process thus creating a major problem while designing an effective character segmentation technique. Preprocessing, character segmentation, feature extraction and finally classification & recognitions are the major steps which are followed by a general OCR. The preprocessing tasks considered in the paper are conversion of gray scaled images to binary images, image rectification, and segmentation of documentĀ“s textual contents into paragraphs, lines, words and then at the level of basic symbols. The basic symbols, obtained as the fundamental unit from segmentation process are recognized by neural classifier. In this work, three feature extraction techniques: histogram of projection based on mean distance, histogram of projection based on pixel value and vertical zero crossing has been used to improve the rate of recognition. These feature extraction techniques are powerful enough to extract features of even distorted characters/symbols. For developing the neural classifier, back-propagation neural network with two hidden layer is used. The classifier is trained and tested for printed Hindi texts. A performance of approximately 90% correct recognition rate is achieved."
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.