The comparative analysis of Marathi OCR softwares (original) (raw)
Related papers
Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. Optical Character Recognition (OCR) is a very important task in Pattern Recognition. Foreign languages, especially English character recognition has been extensively studied by many researches but due to complication of Indian Languages like Hindi ,Punjabi ,teulgu ,malyalam etc. the research work is very limited and constrained. This paper presents the research work related to all Indian languages, various approaches to character recognition along with some applications of character recognition is also discussed in this paper. The aim of this paper is to provide an overview of the research going on in Indian script OCR systems. This survey paper has been felt necessary when the research on OCRs for Indian scripts is still a challenging task. Hence, a brief introduction to the general OCR and typical steps in the development of an OCR are give...
Different Approaches in OCR of Indian Languages
Lots of research work has been done in the field of OCR and lots of article has been published during last few years but not much work is done in Indian language. This paper gives an elementary knowledge about the different approaches of OCR's in different Indian language, so it is beneficial for one, who wants to choose OCR as their field of research, main focus of this paper is to highlight the different techniques and accuracy of different existing OCR's.
OCR in Indian scripts: A survey
2005
India is a multilingual country. A significantly large number of scripts are used to represent these languages. A desire of vision researchers is to develop an integrated optical character recognition (OCR) system, which will be able to process all such scripts. Such a development, if objectified, will not only enable faster flow of information across the country, but also have a profound effect on its scientific and economical development. Courageous endeavours have been successfully made towards the development of systems capable of recognizing machine-printed or handwritten characters and/or numerals. However, most Indian scripts do not have an integrated OCR system. Further, the development of a unified system, which is capable of processing all Indian scripts is still a dream. This article presents a survey of the current literature on the development of OCR's in Indian scripts. Reviewing the basis of and the motivation towards the development of OCR system, the article analyzes the various methodologies employed in general purpose pattern recognition systems. A critical analysis of the work towards OCR systems in Indian languages, with pointers towards possible future work, is also presented.
An Overview of OCR Research in Indian Scripts
This paper gives an overview of the ongoing research in optical character recognition (OCR) systems for Indian language scripts. This survey paper has been felt necessary when the work on developing OCRs for Indian scripts is very promising, and is still in emerging status. The aim of this paper is to provide a starting point for the researchers entering into this field. Peculiarities in Indian scripts, present status of the OCRs for Indian scripts, techniques used in them, recognition accuracies, and the resources available, are discussed in detail. Examples given in this paper are based on authors' work on developing a character recognition system for Telugu, a south Indian language.
2014 9th International Conference on Industrial and Information Systems (ICIIS), 2014
Optical Character Recognition (OCR) deals with automated recognition of characters that are in the format of digital image. OCR refers to the process by which scanned images are electronically processed and converted to an editable document. Handwritten and printed texts are the primary research areas of an OCR. Many OCR systems are commercially available for English and Arabic characters but there is still no recognition system available which yields higher recognition rate even though the scanned images are of high quality. The general framework of a Tamil OCR in the literature involves: preprocessing, line segmentation, word segmentation, character segmentation, feature extraction and recognition of characters. OCR for printed Tamil documents poses challenge owing to: one line may have different font styles, presence of pictures, multi columns, touching of adjacent characters, presence of broken characters, low print quality and complex layout. Furthermore, when comparing 26 alphabets in English, Tamil language has 247 alphabets which makes the recognition more difficult. There are few OCRs for Tamil language that are freely available with a moderate recognition rate as the performance comparisons of such OCRs are not available on a benchmark dataset. In this paper we compare OCRs for printed Tamil texts on four different types of documents: books, magazines, newspapers and pamphlets. Furthermore we propose a post-processing error correction technique to the tested OCRs which reduces the overall mean error rate by nearly 10% on those four categories.
Recognition of Hindi Character Using OCR-Technology: A Review
International Journal of Advanced Trends in Computer Science and Engineering , 2023
Recognition of character is a technique that enables the transformation of various kinds of scanned papers into an editable, readable, and searchable format. In the last two decades, several researchers and technologists have been continuously working in this field to enhance the rate of accuracy. Recognition of character is classified into printed, handwritten , and characters written at image recognition. Recognition of character is the major area of research in the field of pattern recognition. This paper presents an overview of Hindi character recognition by utilizing the optical character recognition (OCR) technique. We surveyed some major research breakthroughs in character recognition, especially for Hindi characters. This research article focuses to provide a deeper insight into the researchers and technologists working in the field of recognition of Hindi-character.
A Survey on Malayalam OCR modules
2016
People start learning to read and write during the early stage of education. As years pass by they may have acquired good reading and writing skills. It may not be difficult for them to read any kind of either printed or handwritten characters. But Computers may find difficulty in deciphering many kinds of printed characters which is of different fonts and styles or handwritten characters. Malayalam OCR is a complex task owing to the various character scripts available and more importantly the difference in ways in which the characters are written. The dimensions are never the same and may be never mapped onto a square grid unlike English characters. This survey paper provides the details of different Malayalam ocr modules and their techniques for identifying and recognizing the malayalam old scripts and converting it to new Malayalam script.
Nepali OCR Project Research Report 2.0
The scanned documents are not always noise free. Thus the pre-processing is required to clean-up scanned documents. Detection of text areas is also an important for the proper segmentation and text information extraction. Before training the classifier training dataset preparation must be done. This process involves automated dataset generation and the manual labeling of character images. This report presents the works done until the time of second reporting of the Nepali OCR Project. This includes the improvement in the pre-processing steps, dataset preparation, and design of basic recognition prototype.
Review on OCR for Handwritten Indian Scripts Character Recognition
Natural language processing and pattern recognition have been successfully applied to Optical Character Recognition (OCR). Character recognition is an important area in pattern recognition. Character recognition can be printed or handwritten. Handwritten character recognition can be offline or online. Many researchers have been done work on handwritten character recognition from the last few years. As compared to non-Indian scripts, the research on OCR of handwritten Indian scripts has not achieved that perfection. There are large numbers of systems available for handwritten character recognition for non-Indian scripts. But there is no complete OCR system is available for recognition of handwritten text in any Indian script, in general. Few attempts have been carried out on the recognition of Devanagari, Bangla, Tamil, Oriya and Gurmukhi handwritten scripts. In this paper, we presented a survey on OCR of these most popular Indian scripts.
Optical Character Recognition (OCR) is a technique, which is used to extract the text from document images and converted into text format. This kind of information retrieval is called as recognition based retrieval hence that it can be edited, searched, stored more efficiently. OCR is used for many applications such as library, organization, bank cheques, number plate recognition, historical book analysis and many others applications. Various OCR tools are available for converting document images in different types of languages. The primary objective of this work is to compare the performance analysis of the three different OCR tools for extracting the text information from Tamil and Hindi document images.