A Method of Recognition of Arabic Cursive Handwriting (original) (raw)
Related papers
Arabic handwriting recognition using structural and syntactic pattern attributes
Pattern Recognition, 2013
In this paper, we present research results on off-line Arabic handwriting recognition using structural techniques. Statistical methods have been more common in the reported research on Arabic handwriting recognition. Structural methods have remained largely unexplored in this regard. However, both statistical and structural techniques can be effectively integrated in multi-classifier based systems. This paper presents, to our knowledge, the first integrated offline Arabic handwritten text recognition system based on structural techniques. In implementing the system, several novel algorithms and techniques for structural recognition of Arabic handwriting are introduced. An Arabic text line is segmented into words/sub-words and dots are extracted. An adaptive slant correction algorithm that is able to correct the different slant angles of the different components of a text line is presented. A novel segmentation algorithm, which is integrated into the recognition phase, is designed based on the nature of Arabic writing and utilizes a polygonal approximation algorithm. This is followed by Arabic character modeling by 'fuzzy' polygons and later recognized using a novel fuzzy polygon matching algorithm. Dynamic programming is used to select best hypotheses of a sequence of recognized characters for each word/sub-word. In addition, several other key ideas, namely prototype selection using set-medians, lexicon reduction using dot-descriptors etc. are utilized to design a robust handwriting recognition system. Results are reported on the benchmarking IfN/ENIT database of Tunisian city names which indicate the robustness and the effectiveness of our system. The recognition rates are comparable to multi-classifier implementations and better than single classifier systems.
2007
The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing online Arabic cursive handwriting. Rule-based methods are used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. A new technique for text line separation is also used.
On-line recognition of handwritten Arabic characters
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990
Arabic characters are always in cursive script. Handwritten words were entered into an IBM PC via a graphics tablet and a segmentation process applied to the points; the length and the slope of each segment was then found, and the slope categorized to one of four directions. In the learning process, specifications of the strokes of each character are fed to the computer. In the recognition process, the parameters of each stroke are found and special rules applied to select the collection of strokes which best matches the features of one of the stored characters. The results are promising, and suggestions for improvements leading to 100% recognition are proposed.
Arabic character recognition system: A statistical approach for recognizing cursive typewritten text
Pattern Recognition, 1990
Character recognition systems can contribute tremendously to the advancement of automation process, and can improve the interface between man and machine (computers) in many applications, including office automation and data entry. In this report we present a recognition system for typed Arabic text, which involves a statistical approach for character recognition. This approach uses "Accumulative Invariant Moments" as an identifier, which helped in the segmentation of connected and overlapping Arabic characters. However, Invariant Moments proved to be very sensitive to slight changes in a character shape. These changes are normally due to typing and the scanning process, and cannot be avoided. The recognition zone was defined based on the mean and standard deviation for the moments of a large sample of each character. However, this zone was increased, using an empirical multiplier, to improve recognition rate. The system was implemented on a mainframe in APL programming language for ease of experimentation, and then transported to a PC environment in BASIC for better portability. The recognition rate achieved was 94%, with a recognition speed of 10.6 characters/minute, running on a PC/AT with a math co-processor.
Automatic segmentation for Arabic characters in handwriting documents
Automatic off-line Arabic handwriting recognition still faces a big challenges. Due to the cursive nature of the Arabic language, most of published works are based on recognition of a whole word without segmentation. This paper presents a new framework for the recognition of handwritten Arabic words based on segmentation. This framework involves two phases (training phase and testing phase). In the training phase, Arabic handwritten characters were trained to be recognized, while in the testing phase, words were segmented into characters for recognition. Classification is achieved in two steps (classification of the segmented characters and classification of the word). A dictionary is constructed and used to correct any errors occurring during the previous stages of the recognition process. This work has been tested with IFN/ENIT database and a comparison made against some existing methods and promising results have been obtained.
Recognition of handwritten Arabic characters: challenges and prospective
IIUM Press eBooks, 2011
Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%.
Recognition Of Hand Written Arabic Characters
Applications of Digital Image Processing XI, 1988
Optical Character Recognition (OCR) is the process of recognizing printed or handwritten text on paper documents. This paper proposes an OCR system for Arabic characters. In addition to the preprocessing phase, the proposed recognition system consists mainly of three phases. In the first phase, we employ word segmentation to extract characters. In the second phase, Histograms of Oriented Gradient (HOG) are used for feature extraction. The final phase employs Support Vector Machine (SVM) for classifying characters. We have applied the proposed method for the recognition of Jordanian city, town, and village names as a case study, in addition to many other words that offers the characters shapes that are not covered with Jordan cites. The set has carefully been selected to include every Arabic character in its all four forms. To this end, we have built our own dataset consisting of more than 43.000 handwritten Arabic words (30000 used in the training stage and 13000 used in the testing stage). Experimental results showed a great success of our recognition method compared to the state of the art techniques, where we could achieve very high recognition rates exceeding 99%.
Online handwriting recognition for the Arabic letter set
Proceedings of the 5th WSEAS …, 2011
Automated methods for the recognition of Arabic script are at an early stage compared to their equivalent for the recognition of Latin and Chinese languages, especially of online handwriting recognition. In this paper we describe the stages of the recognition process unique to the Arabic hand written text. We also introduce an account of Arabic online handwriting recognition methods in literature, with a rich list of references for the interested readers. We cast some light on the characteristics of Arabic writing and present an overview of the common stages normally followed by handwriting recognition systems which are: preprocessing, segmentation, feature extraction, classification, and post-processing along with the most used techniques.
Automatic recognition of handwritten Arabic characters: a comprehensive review
The paper is a comprehensive review of the current research trends in the area of Arabic language especially state-of-the-art approaches to highlight the current status of diverse research aspects of that area to facilitate the adaption and extension of previous systems into new applications and systems. The Arabic language has deep, widespread and unexplored scope to research although the tremendous effort and researches that had been done previously. Modern state-of-the-art methods and approaches with fewer errors are required according to the high speed of hardware and technology development. The focus of this article will be on the offline Arabic handwritten text recognition as it is one of the most important topics in the Arabic scope. The main objective of this paper is critically analyzing the current researches to identify the problem areas and challenges faced by the previous researchers. This identification is intended to provide many recommendations for future advances in the area. It also compares and contrasts technical challenges, methods and the performances of handwritten text recognition previous researches works. It summarizes the critical problems and enumerates issues that should be considered when addressing these tasks. It also shows some of the Arabic datasets that can be used as inputs and benchmarks for training, testing and comparisons. Finally, it provides a fundamental comparison and discussion of some of the remaining open problems and trends in that field.
Automatic Arabic Hand Written Text Recognition System
2007
Despite of the decent development of the pattern recognition science applications in the last decade of the twentieth century and this century, text recognition remains one of the most important problems in pattern recognition. To the best of our knowledge, little work has been done in the area of Arabic text recognition compared with those for Latin, Chins and Japanese text. The main difficulty encountered when dealing with Arabic text is the cursive nature of Arabic writing in both printed and handwritten forms. An Automatic Arabic Hand-Written Text Recognition (AHTR) System is proposed. An efficient segmentation stage is required in order to divide a cursive word or sub-word into its constituting characters. After a word has been extracted from the scanned image, it is thinned and its base line is calculated by analysis of horizontal density histogram. The pattern is then followed through the base line and the segmentation points are detected. Thus after the segmentation stage, the cursive word is represented by a sequence of isolated characters. The recognition problem thus reduces to that of classifying each character. A set of features extracted from each individual characters. A minimum distance classifier is used. Some approaches are used for processing the characters and post processing added to enhance the results. Recognized characters will be appended directly to a word file which is editable form.