A novel statistical feature extraction method for textual images: Optical font recognition (original) (raw)

A Statistical Global Feature Extraction Method for Optical Font Recognition

2011

The study of optical font recognition has becoming more popular nowadays. In line to that, global analysis approach is extensively used to identify various font type to classify writer identity. Objective of this paper is to propose an enhanced global analysis method. Based on statistical analysis of edge pixels relationships, a novel method in feature extraction for binary images has proposed. We test the proposed method on Arabic calligraphy script image for optical font recognition application. We classify those images using Multilayer Network, Bayes network and Decision Tree classifiers to identify the Arabic calligraphy type. The experiments results shows that our proposed method has boost up the overall performance of the optical font recognition.

SEVERAL METHODS OF FEATURE EXTRACTION TO HELP IN OPTICAL CHARACTER RECOGNITION

International Journal of Students’ Research in Technology & Management, 2017

An Optical Character Recognition (OCR) consists of three bold steps namely Preprocessing, Feature extraction, Classification. Methods of Feature extraction yield feature vectors based on which the classification of a testing pattern is executed. The paper aims at proposing some methods of feature extraction that may go a long way to recognize a Bengali numeral or character. Pixel Ex-OR Method presents a digital gating (Ex-OR) technique to extract the information in an image. Two successive elements of a row in image matrix have been Ex-ORed and the output is again Ex-ORed with the next element. Alphabetical coding codes a binary character image by means of letters of English alphabet. Directional features find gradient information using Sobel Masks to make position of stroke clear in an image. The features have been derived in eight standard directions and then these eight feature vectors are merged into four sets of features to reduce the system complexity and hence processing time is saved considerably. These features will help develop a Bengali numeral recognition system.

Optical Character Recognition

—This paper describes two implementations in optical character recognition using template matching method and feature extraction method followed by support vector machine classification. With proper image preprocessing, the texts are segmented into isolated characters and the correlations between a single character and a given set of templates are computed to find the similarities and then identify the input character. In the second method, features extracted from the segmented characters are used to train the SVM classifiers, which are later, tested by a test set of handwritten digits.

An Effectual Optical Character Recognition Using Efficient Learning System

Elsevier SSRN digital library, 2019

There is an emerging attention for the various automatic classification frameworks to distinguish characters in software structure when information is looked over piece of paper records as it is noticed that the amount of daily identifications and records which are in published manner recognized with various businesses. This is popularly known as the document image exploration. To utilize Optical Character Recognition efficiently for character categorization so as to achieve image analysis, the data is utilizing in Grid assemblies. For high processing of documents, the heavy industries require a product as the framework which is known as character recognition classification. Also, there is need to create character recognition programming agenda to achieve document analysis in the form of image processing which deals with the organization of the material to the electronic reading arrangement. So this paper deals with the efficient learning approach which deals with the automatic classification from using processing of the images and machine learning for the optical character recognition process. In this research we have worked on the feature extraction using Independent component analysis with the swarm intelligence approach with Firefly algorithm, because it helps us to reduce more error probabilities and reduces the false positive and negative rates and increase the high learning rate. The learning is achieved using neural network. It is noticed that the proposed approach is able to perform high in terms of high specificity, sensitivity and recognition rate through which it is noticed that the proposed approach is able to achieve high true positive and true negative rates.

Optical Recognition of Digital Characters Using Machine Learning

2018

Recent improvement in pattern recognition by many applications has been demanding , such as OCR, classification of Document, Data Mining etc. Use of OCR has vital role in Document scanners, character recognition, language recognition, security, authentication in Bank etc. OCR is classified into two types: online character recognition and offline character recognition system. Online OCR out beats offline OCR as characters are processed as it is written, this avoids initial stage of identifying the character .Offline OCR are further sub-divided into printed and handwritten OCR. In offline OCR are processed typically by scanning the typewritten /handwritten characters into binary or gray scale image to the recognition algorithm.

Implementing a System for Recognizing Optical Characters

In the current paper we present a system of characters recognition by taking the photo of character with the identity of symbolic. In the proposed system we are going to make a scan in kind of optical for input character in order to be digitized. After that every character will be segmented and located and after that it will be obtained as a photo to be processed for normalization and even for reducing noise. After that it will be classified. Then from the obtained extraction we can find various techniques like weakness and strengths. Next step will be grouping the characters which identified in order to obtain the original string of symbols and we can apply the context in order to fix and detect false. The results show us that the system is working well and the recognition is really good. The system proposed in a program, developed in Matlab environment, which provides the ability to insert a character in an image. It is agree that making a machine to do what human can do is a dream, for example reading is one of the most important functions that humans are doing. However, this dream is becoming true day by day and researchers and working on this by many ways, where nowadays artificial intelligence is focusing on pattern recognition and in this field it is also focusing on the applications of character recognition and even many organizations and companies are designing systems for character recognition by many application and even that it is facing some challenges to make machines be able to read like humans and have the same capabilities. Recognizing characters is challenging some problems with the optical characters. Although, it is performed to be off line optical recognition for characters especially after completing the printing and writing, and to be online recognition to recognize characters as they have been drawn or written. Printed characters and even hand written characters could be recognized, but what we are always looking for is the performance where especially it is depending on the quality of files that been entered. Next step of challenging reviewed by many researchers is the online and the offline cursive writing. To get new ideas in the recognition of pattern, the classifying of characters could be tested, but where the experiments results are conducted on isolated characters, here the results are not necessary in case of immediately relevant to optical character recognition. Maybe more striking than the improvement of the accuracy and limit in methods of classification has been decreased in cost. The old devices of optical character recognition equipments were some optical hardware like the optical page reader of the company of IBM in order to read typed earning reports at the social security administration which cost more than two million dollars and some electronic and some high expensive scanners. Nowadays, the software of optical character recognition is often add on to scanner of desktop which is not costly. The main goal is to examine some details in examples of the false which committed by the proposed system. 2. PROPOSED SYSTEM The general technique is very simple to describe. The proposed optical character recognition system will contain some components and they are presented in figure 1. The install is illustrated, where to digitize the analog file by the optical scanner will be the first step in the system. After that the area which containing characters will be located and every symbol extracted by the process of segmentation. After that applying a preprocessing on the extracted symbols and then we are going to reduce the noise and eliminate it in order to make it easier the feature extraction to be prepared for the coming step. After that we are going to comparing the description of the classes of symbols which are gained by a phase of previous learning with the extracted features in order to find the identity of the symbol. Then to reconstruct the numbers and words of the original string we are going to use the contextual information.