A Detailed Study and Recent Research on OCR (original) (raw)

Steps Involved in Text Recognition and Recent Research in OCR; A Study

The Optical Character Recognition (OCR) is one of the automatic identification techniques that fulfill the automation needs in various applications. A machine can read the information present in natural scenes or other materials in any form with OCR. The typed and printed character recognition is uncomplicated due to its well-defined size and shape. The handwriting of individuals differs in the above aspects. So, the handwritten OCR system faces complexity to learn this difference to recognize a character. In this paper, we discussed the various stages in text recognition, handwritten OCR systems classification according to the text type, study on Chinese and Arabic text recognition as well as application oriented recent research in OCR.

A DETAILED STUDY AND ANALYSIS OF OCR USING MATLAB

This paper presents detailed review in the field of Optical Character Recognition. Various techniques are determine that have been proposed to realize the center of character recognition in an optical character recognition system. Even though, sufficient studies and papers are describes the techniques for converting textual content from a paper document into machine readable form. Optical character recognition is a process where the computer understands automatically the image of handwritten script and transfer into classify character. This material use as a guide and update for readers working in the Character Recognition area. Selection of a relevant feature extraction method is probably the single most important factor in achieving high character recognition with much better accuracy in character recognition systems without any variation. Character recognition techniques associate a symbolic identity with the image of character. In a typical OCR systems input characters are digitized by an optical scanner. Each character is then located and segmented, and the resulting character image is fed into a pre-processor for noise reduction and normalization. Certain characteristics are the extracted from the character for classification. The feature extraction is critical and many different techniques exist, each having its strengths and weaknesses. After classification the identified characters are grouped to reconstruct the original symbol strings, and context may then be applied to detect and correct errors.

Optical Character Recognition (OCR) System

In the running world, there is growing demand for the software systems to recognize characters in computer system when information is scanned through paper documents as we know that we have number of newspapers and books which are in printed format related to different subjects. These days there is a huge demand in " storing the information available in these paper documents in to a computer storage disk and then later reusing this information by searching process ". One simple way to store information in these paper documents in to computer system is to first scan the documents and then store them as IMAGES. But to reuse this information it is very difficult to read the individual contents and searching the contents form these documents line-by-line and word-byword. The reason for this difficulty is the font characteristics of the characters in paper documents are different to font of the characters in computer system. As a result, computer is unable to recognize the characters while reading them. This concept of storing the contents of paper documents in computer storage place and then reading and searching the content is called DOCUMENT PROCESSING. Sometimes in this document processing we need to process the information that is related to languages other than the English in the world. For this document processing we need a software system called CHARCATER RECOGNITION SYSTEM. This process is also called DOCUMENT IMAGE ANALYSIS (DIA).

Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR)

IEEE Access

Given the ubiquity of handwritten documents in human transactions, Optical Character Recognition (OCR) of documents have invaluable practical worth. Optical character recognition is a science that enables to translate various types of documents or images into analyzable, editable and searchable data. During last decade, researchers have used artificial intelligence / machine learning tools to automatically analyze handwritten and printed documents in order to convert them into electronic format. The objective of this review paper is to summarize research that has been conducted on character recognition of handwritten documents and to provide research directions. In this Systematic Literature Review (SLR) we collected, synthesized and analyzed research articles on the topic of handwritten OCR (and closely related topics) which were published between year 2000 to 2019. We followed widely used electronic databases by following pre-defined review protocol. Articles were searched using keywords, forward reference searching and backward reference searching in order to search all the articles related to the topic. After carefully following study selection process 176 articles were selected for this SLR. This review article serves the purpose of presenting state of the art results and techniques on OCR and also provide research directions by highlighting research gaps.

An Overview and Applications of Optical Character Recognition

International Journal of Advance Research In Science And Engineering (IJARSE), India, ISSN 2319-8346 (P), ISSN-2319-8354(E), Vol.3, Issue 7, Pages 261- 274, 2014

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic conversion of scanned or photographed images of typewritten or printed text into machine-encoded/computer-readable text. It is widely used as a form of data entry from some sort of original paper data source, whether passport documents, invoices, bank statement, receipts, business card, mail, or any number of printed records. It is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as machine translation, text-to-speech, key data extraction and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision. Optical Character Recognition or OCR is the electronic translation of handwritten, typewritten or printed text into machine translated images. It is widely used to recognize and search text from electronic documents or to publish the text on a website [1]. A large number of research papers and reports have already been published on this topic. The paper presents introduction, major research work and applications of Optical Character Recognition in various fields. At the first introduction of OCR will be discussed and then some points will be stressed on the major research works that have made a great impact in character recognition. And finally the most important applications of OCR will be covered and then conclusion.

Offline optical character recognition (OCR) method: An effective method for scanned documents

22nd International Conference on Computer and Information Technology (ICCIT) (Publisher: IEEE), 2019

Optical Character Recognition (OCR) is a major computer vision task by which characters of image are detected and recognized by comparing to training set images. Process of detecting character is one of the perplexing tasks in computer vision. This is because of input image often not correctly aligned or because of noise. This paper presents a complete Optical Character Recognition (OCR) system which is worked for English character mostly for Calibri font. This system first corrects skew of image if input image is not correctly aligned followed by noise reduction from input image. This process is passed through line and character segmentation that are passed into the recognition module and recognize characters. By experimenting with a set of 50 images, average achievement is 92%, 98% is for Calibri font. Moreover, the developed technique is computationally efficient and requires less time than other Optical character recognition system.

OCR Related Technology Methods

International Journal of Advanced Trends in Computer Science and Engineering, 2020

The technology associated with character recognition has emerged as a vital technology within the era of the fourth historic period. Character recognition is developing as a core technology needed in various fields. Character recognition is performed by extracting characters from a picture and recognizing the extracted characters. Character recognition technology has been continuously developed. Recently, together with the event of the fourth historic period, character recognition technology has been used as a core technology in many places. This paper introduces the technology associated with character recognition and therefore the program for character recognition.

A Survey on Optical Character Recognition System

2017

Optical Character Recognition (OCR) has been a topic of interest for many years. It is defined as the process of digitizing a document image into its constituent characters. Despite decades of intense research, developing OCR with capabilities comparable to that of human still remains an open challenge. Due to this challenging nature, researchers from industry and academic circles have directed their attentions towards Optical Character Recognition. Over the last few years, the number of academic laboratories and companies involved in research on Character Recognition has increased dramatically. This research aims at summarizing the research so far done in the field of OCR. It provides an overview of different aspects of OCR and discusses corresponding proposals aimed at resolving issues of OCR.

A Study of Optical Character Patterns identified by the different OCR Algorithms

Optical Character Recognition (OCR) is a technology that provides a full alphanumeric recognition of printed or handwritten characters. Optical Character Recognition is one of the most interesting and challenging research areas in the field of Image processing. Image Acquisition, Pre-processing, Segmentation, Feature Extraction and Classification are stages of OCR. In this paper, how character patterns are identified in the classification stage by different algorithms is presented. Template Matching Algorithm, statistical Algorithm, Structural Algorithm, Neural Network Algorithm and Support Vector Machine Algorithm are presented in this paper.