31 System for OCR of Printed Telugu Text in Complicated Layouts and Backgrounds (original) (raw)

Most of the work reported in the literature for Optical Character recognition (OCR) assumes the background to be clean or white and works for one or two fonts. Real documents can range from simple plain backgrounds to complex uneven illuminated backgrounds. OCR of such documents is further complicated due to text written in a variety of fonts and sizes. In this paper, an OCR system is proposed for OCR of printed Telugu text written in complicated layouts and backgrounds. The proposed system has been tested on a variety of images taken from different newspapers, old books and synthetic images on textured backgrounds. It works on several unknown fonts even in the presence of complicated backgrounds although the database consists of only four fonts. The recognition accuracies obtained are 98% on average. Thus the proposed approach for Telugu OCR performs well on a larger variety of images than the previous attempts which were more restrictive and domain specific.