Automatic segmentation for Arabic characters in handwriting documents (original) (raw)
Related papers
A Framework for Arabic Handwritten Recognition Based on Segmentation
Automatic off-line Arabic handwriting recognition still faces a big challenges. Due to the cursive nature of the Arabic language, most of published works are based on recognition of a whole word without segmentation. This paper presents a new framework for the recognition of handwritten Arabic words based on segmentation. This framework involves two phases (training phase and testing phase). In the training phase, Arabic handwritten characters were trained to be recognized, while in the testing phase, words were segmented into characters for recognition. Classification is achieved in two steps (classification of the segmented characters and classification of the word). A dictionary is constructed and used to correct any errors occurring during the previous stages of the recognition process. This work has been tested with IFN/ENIT database and a comparison made against some existing methods and promising results have been obtained.
Off-Line Arabic Handwriting Character Recognition Using Word Segmentation
The ultimate aim of handwriting recognition is to make computers able to read and/or authenticate human written texts, with a performance comparable to or even better than that of humans. Reading means that the computer is given a piece of handwriting and it provides the electronic transcription of that (e.g. in ASCII format). Two types of handwriting: on-line and offline. The most important purpose of off-line handwriting recognition is in protection systems and authentication. Arabic Handwriting scripts are much more complicated in comparison to Latin scripts. This paper introduces a simple and novel methodology to authenticate Arabic handwriting characters. Reaching our aim, we built our own character database. The research methodology depends on two stages: The first is character extraction where preprocessing the word and then apply segmentation process to obtain the character. The second is the character recognition by matching the characters comprising the word with the lette...
An Enhanced Technique for Offline Arabic Handwritten Words Segmentation
The accuracy of handwritten word segmentation is essential for the recognition results; however, it is extremely complex task. In this work, an enhanced technique for Arabic handwriting segmentation is proposed. This technique is based on a recent technique which is dubbed in this work the base technique. It has two main stages: over-segmentation and neural-validation. Although the base technique gives promising results, it still suffers from many drawback such as the missed and bad segmentation-points(SPs). To alleviate these problems, two enhancements has been integrated in the first stage: word to sub-word segmentation and the thinned word restoration. Additionally, in the neural-validation stage an enhanced area concatenation technique is utilized to handle the segmentation of complex characters such as .س Both techniques were evaluated using the IFN/ENIT database. The results show that the bad and missed SPs have been significantly reduced and the overall performance of the system is increased.
Recognition-Based Segmentation Algorithm for On-Line Arabic Handwriting
2009 10th International Conference on Document Analysis and Recognition, 2009
In this paper, we introduce an on-line Arabic handwritten recognition system based on new stroke segmentation algorithm. The proposed algorithm uses an over segmentation method that has the advantage of giving all correct segments at least. It is based on arbitrary segmentation followed by segmentation enhancement, consecutive joints connection and finally segmentation point locating. The proposed system gives an excellent recognition rate up to 97% and 92% for words and letter recognition.
Designing an Arabic Handwritten Segmentation System
2016
The greatest difficulty facing the recognition of Arabic handwritten words is segmentation, because Arabic handwriting is cursive with complex multi-form styles. Hence, intensive research efforts are needed to reach an effective Arabic handwriting segmentation system. This paper presents a system which uses morphological features of the Arabic characters for segmentation. The proposed system segments non-overlapped (horizontally connected -e.g. "حسن") as well as overlapped (vertically connected - e.g. "نجد") characters. The result is not very good one. However, it arrives at good directives for more research. As the writing was freely without any restrictions, both over-segmentation and under-segmentation problems affect the system.
Offine Automatic Segmentation based Recognition of Handwritten Arabic Words
The world heritage of handwritten Arabic documents is huge however only manual indexing and retrieval techniques of the content of these documents are available. To facilitate an automatic retrieval of such hand-written Arabic document, a number of automatic recognition systems for handwritten Arabic words have been proposed. Nevertheless, these systems suffer from low recognition accuracy due to the peculiarities of the handwritten Arabic language. Thus, in this Paper we propose a segmentation based recognition system for handwritten Arabic words. We divide a handwritten word into smaller pieces of a word and then these small pieces are segmented into candidate letters. These candidate letters are converted into their correspondence chain-code representation. Thereafter we extract discrete, statistical and structural features for classifica-tion. Additionally, we introduce a novel active contour based feature to increase the recognition accuracy of strongly deformed Arabic letters....
Segmenting handwritten Arabic text
2002
The segmentation and recognition of Arabic handwritten text has been an area of great interest in the past few years. However, a small number of research papers and reports have been published in this area. There are several major problems with Arabic handwritten text processing: Arabic is written cursively and many external objects are used such as dots, 'Hamza', 'Madda', and diacritic objects. In addition, Arabic characters have more than one shape according to their position inside a word. More than one character can also share the same horizontal space, creating vertically overlapping connected or disconnected blocks of characters. This makes the problem of segmentation of Arabic text into characters, and their classification even more difficult. In this work a technique is presented that segments difficult handwritten Arabic text. A conventional algorithm is used for the initial segmentation of the text into connected blocks of characters. The algorithm then generates pre-segmentation points for these blocks. A neural network is subsequently used to verify the accuracy of these segmentation points. Another conventional algorithm uses the verified segmentation points and segments the connected blocks of characters. These characters can then be used as input to another neural network for classification.
2007
The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing online Arabic cursive handwriting. Rule-based methods are used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. A new technique for text line separation is also used.
Segmentation of Arabic Handwriting Based on both Contour and Skeleton Segmentation
2009 10th International Conference on Document Analysis and Recognition, 2009
We propose a new algorithm for segmentation of off-line handwritten Arabic words. The algorithm segments the connected letters to smaller segments each of which contains no more than three letters. Each letter may be segmented to at most five pieces. In addition to improving the recognition of Arabic words, another potential application of the proposed segmentation method is to build lexicon of small size, consisting of no more than three letter combinations. Generally, it is very hard to generate lexicon for recognition of unconstraint handwritten Arabic documents due to the large number of words of Arabic language.
Segmentation techniques for Arabic handwritten: a review
International Journal of Electrical and Computer Engineering (IJECE), 2024
Image segmentation refers to the process of partitioning a page into distinct sections. This technique aims to improve and transform the image's representation into a more coherent and user-friendly format. Its common application involves identifying objects and boundaries (such as lines and curves) within images. However, this paper focuses on discussing segmentation methods specifically tailored for Arabic handwritten content. Dealing with the segmentation of Arabic handwritten material poses a significant challenge due to the diverse handwriting styles and the interconnection between Arabic letters. The paper will also touch on the classification of segmentation algorithms originally designed for modern documents, illustrating their adaptation in document processing. Furthermore, the paper will address the difficulties associated with segmenting Arabic handwritten content, including variations in writing style, the connected nature of Arabic characters, the complexity of Arabic cursive writing and as well as the diacritics challenges. Lastly, a concise overview of previously widely used segmentation techniques in various research endeavors will be provided.