Randa Elanwar | Electronics Research Institute (original) (raw)

Papers by Randa Elanwar

Research paper thumbnail of Generative adversarial networks for handwriting image generation: a review

The Visual Computer, 2024

Handwriting synthesis, the task of automatically generating realistic images of handwritten text,... more Handwriting synthesis, the task of automatically generating realistic images of handwritten text, has gained increasing attention in recent years, both as a challenge in itself, as well as a task that supports handwriting recognition research. The latter task is to synthesize large image datasets that can then be used to train deep learning models to recognize handwritten text without the need for human-provided annotations. While early attempts at developing handwriting generators yielded limited results [1], more recent works involving generative models of deep neural network architectures have been shown able to produce realistic imitations of human handwriting [2-19]. In this review, we focus on one of the most prevalent and successful architectures in the field of handwriting synthesis, the generative adversarial network (GAN). We describe the capabilities, architecture specifics, and performance of the GAN-based models that have been introduced to the literature since 2019 [2-14]. These models can generate random handwriting styles, imitate reference styles, and produce realistic images of arbitrary text that was not in the training lexicon. The generated images have been shown to contribute to improving handwriting recognition results when augmenting the training samples of recognition models with synthetic images. The synthetic images were often hard to expose as non-real, even by human examiners, but also could be implausible or style-limited. The review includes a discussion of the characteristics of the GAN architecture in comparison with other paradigms in the image-generation domain and highlights the remaining challenges for handwriting synthesis.

Research paper thumbnail of Extracting text from scanned Arabic books: A a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model ArticleCategory : Original Paper

International Journal of Document analysis and recognition, 2021

Datasets of documents in Arabic are urgently needed to promote computer vision and natural langua... more Datasets of documents in Arabic are urgently needed to promote computer vision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly-available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more than 9,000 9000 high-quality scanned images from over 700 Arabic books. Among these, 1,500 1500 images have been manually segmented into regions and labeled by their functionality. BE-Arabic-9K includes book pages with a wide variety of complex layouts and page contents, making it suitable for various document layout analysis and text recognition research tasks. The paper also presents a page layout segmentation and text extraction baseline model based on fine-tuned Faster R-CNN structure (FFRA). This baseline model yields to cross-validation results with an average accuracy of 99.4% and F1 score of 99.1% for text versus non-text block classification on 1,500 1500 annotated images of BE-Arabic-9K. These results are remarkably better than those of the state-of-the-art Arabic book page segmentation system ECDP. FFRA also outperforms three other prior systems when tested on a competition benchmark dataset, making it an outstanding baseline model to challenge.

Research paper thumbnail of BCE-Arabic-v1 dataset

Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments - PETRA '16, 2016

Research paper thumbnail of The state of the art in handwriting synthesis

Cursive handwriting is a complex graphic realization of natural human communication. Its producti... more Cursive handwriting is a complex graphic realization of natural human communication. Its production and recognition involve a large number of highly cognitive functions including vision, motor control, and natural language understanding. Handwriting synthesis has many important applications to facilitate user's work and personalize the communication on pen-based devices. The problem of handwriting synthesis is not new and a number of studies have been published in the literature. Some approaches, Movement-simulation techniques, make a real attempt at modeling the process of handwriting production. Other approaches, Shape-simulation techniques, which usually record the glyphs directly and reuse or sample the glyphs when synthesis. Different challenges are holding back the progress of such type of research. In this paper we present a literature review about the recent trends in handwriting synthesis, highlighting the different generation processes and pointing out the challenges facing the researchers. Finally we are giving a conclusion about the scientific research collections presented, and summarizing our opinions to help move future work up to maturity.

Research paper thumbnail of Multi-Label and Multilingual News Framing Analysis

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Research paper thumbnail of Text and metadata extraction from scanned Arabic documents using support vector machines

Journal of information science (JIS), SAGE pub, 2020

Text information in scanned documents becomes accessible only when extracted and interpreted by a... more Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recogni-zers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.

Research paper thumbnail of Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers

International Journal on Document Analysis and Recognition (IJDAR), 2018

Raster-image PDF files originating from scanning or photographing paper documents are inaccessibl... more Raster-image PDF files originating from scanning or photographing paper documents are inaccessible to both text search engines and screen readers that people with visual impairments use. We here focus on the relatively less-researched problem of converting raster-image files with Arabic script into machine-accessible documents. Our method, called ECDP for “Ensemble-based classification of document patches,” segments the physical layout of the document, classifies image patches as containing text or graphics, assembles homogeneous document regions, and passes the text to an optical character recognition engine to convert into natural language. Classification is based on the majority voting of an ensemble of support vector machines. When tested on the dataset BCE-Arabic [Saad et al. in: ACM 9th annual international conference on pervasive technologies related to assistive environments (PETRA’16), Corfu, 2016], ECDP yielded an average patch classification accuracy of 97.3% and average F1 score of 95.26% for text patches and efficiently extracted text zones in both paragraphs and text-embedded graphics, even if the text is rotated by 90∘ or is in English. ECDP outperforms a classical layout analysis method (RLSA) and a state-of-the-art commercial product (RDI-CleverPage) on this dataset and maintains a relatively high level of performance on document images drawn from two other datasets (Hesham et al. in Pattern Anal Appl 20:1275–1287, 2017; Proprietary Dataset of 109 Arabic Documents. http://www.rdi-eg.com). The results suggest that the proposed method has the potential to generalize well to the analysis of documents with a broad range of content.

Research paper thumbnail of The ASAR 2018 Competition on Physical Layout Analysis of Scanned Arabic Books (PLA-SAB 2018)

Successful physical layout analysis (PLA) is a key factor in the performance of text recognizers ... more Successful physical layout analysis (PLA) is a key factor in the performance of text recognizers and many other applications. PLA solutions for scanned Arabic documents are few and difficult to compare due to differences in methods, data, and evaluation metrics. To help evaluate the performance of recent Arabic PLA solutions, the ASAR 2018 Competition on Physical Layout Analysis (PLA) was organized. This paper presents the results of this competition. The competition
focused on analyzing layouts for Arabic scanned book pages (SAB). PLA-SAB required solutions of two tasks: page-to-block segmentation and block text/non-text classification. In this paper we briefly describe the methods provided by participating teams, present their results for both tasks using the BCEArabic benchmarking dataset [1], and make an open call for continuous participation outside the context of ASAR 2018.

Research paper thumbnail of ASAR 2018 Layout Analysis Challenge: Using Random Forests to Analyze Scanned Arabic Books

Physical Layout Analysis (PLA) is a necessary step to recognize the contents of a digital documen... more Physical Layout Analysis (PLA) is a necessary step to recognize the contents of a digital document. PLA includes segmenting the document image and identifying the content type of the segments. PLA for digitized Arabic documents is challenging due to the nature of the Arabic script. In this paper, we introduce a PLA system for Arabic documents that were digitized by scanning. Our system RFAAD, short for ”Random Forests for Analyzing Arabic Documents,” starts with morphological preprocessing of the digitized hard copy and then extracts geometrical, shape, and context features to identify the connected components (CC) of the digital image as containing text or non-text. Random forests are trained using the first dataset release of a large data collection project,
BCE-Arabic-v1 [22]. Our system shows strong performance on BCE data in terms of CC classification accuracy and F1-score (97.5% and 97.7% respectively). When evaluated on datasets by other researchers [2], [11], RFAAD also performs well. Moreover, RFAAD shows moderately strong performance when applied to the most challenging layouts of the benchmarking dataset of the ASAR 2018 competition PLA-SAB.1 The
performance of RFAAD suggests that our work, with some modifications, has the potential to solve other open problems in the document analysis area and attain a relatively high degree of generalization.

Research paper thumbnail of LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines

Logical layout analysis, which determines the function of a document region, for example, whether... more Logical layout analysis, which determines the function of a document region, for example, whether it is a title, paragraph, or caption, is an indispensable part in a document understanding system. Rule-based algorithms have long been used for such systems. The datasets available have been small, and so the generalization of the performance of these systems is difficult to assess. In this paper, we present LABA, a supervised machine learning system based on multiple support vector
machines for conducting a logical Layout Analysis of scanned pages of Books in Arabic. Our system labels the function (class) of a document(scanned book pages) region, based on its position on the page and other features. We evaluated LABA with the benchmark ”BCE-Arabic-v1” dataset, which contains scanned pages of illustrated Arabic books. We obtained high recall and precision values, and found that the F-measure of LABA is higher for all classes except the ”noise” class compared to a neural network method that was based on prior work.

Research paper thumbnail of Rule-Based Algorithms for Handwritten Character Recognition

Research paper thumbnail of BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

Millions of individuals in the Arab world have signifi cant visual impairments that make it diffi... more Millions of individuals in the Arab world have signifi cant visual
impairments that make it difficult for them to access
printed text. Assistive technologies such as scanners and
screen readers often fail to turn text into speech because
optical character recognition software (OCR) has difficulty
to interpret the textual content of Arabic documents. In
this paper, we show that the inaccessibility of scanned PDF
documents is in large part due to the failure of the OCR engine
to understand the layout of an Arabic document. Arabic
document layout analysis (DLA) is therefore an urgent
research topic, motivated by the goal to provide assistive
technology that serves people with visual impairments. We
announce the launching of a large annotated dataset of Arabic
document images, called BCE-Arabic-v1, to be used as
a benchmark for DLA, OCR and text-to-speech research.
Our dataset contains 1,833 images of pages scanned from
180 books and represents a variety of page content and layout,
in particular, Arabic text in various fonts and sizes,
photographs, tables, diagrams, and charts in single or multiple
columns. We report the results of a formative study
that investigated the performance of state-of-the-art document
annotation tools. We found signifi cant differences and
limitations in the functionality and labeling speed of these
tools, and selected the best-performing tool for annotating
our benchmark BCE-Arabic-v1.

Research paper thumbnail of The state of the art in handwriting synthesis The state of the art in handwriting synthesis

Cursive handwriting is a complex graphic realization of natural human communication. Its producti... more Cursive handwriting is a complex graphic realization of natural human communication. Its production and recognition involve a large number of highly cognitive functions including vision, motor control, and natural language understanding. Handwriting synthesis has many important applications to facilitate user's work and personalize the communication on pen-based devices. The problem of handwriting synthesis is not new and a number of studies have been published in the literature. Some approaches, Movement-simulation techniques, make a real attempt at modeling the process of handwriting production. Other approaches, Shape-simulation techniques, which usually record the glyphs directly and reuse or sample the glyphs when synthesis. Different challenges are holding back the progress of such type of research. In this paper we present a literature review about the recent trends in handwriting synthesis, highlighting the different generation processes and pointing out the challenges facing the researchers. Finally we are giving a conclusion about the scientific research collections presented, and summarizing our opinions to help move future work up to maturity.

Research paper thumbnail of On-Line Arabic Handwriting Text Line Detection Using Dynamic Programming

—Text line detection in unconstrained handwritten documents remains an open document analysis pro... more —Text line detection in unconstrained handwritten documents remains an open document analysis problem. The typical method based on horizontal projection analysis and regrouping connected components is not usually successful because of baseline undulations and shifts, baseline-skew variability and inter-line distance variability. This work deals with on-line Arabic handwritten documents segmentation such as pages of handwritten notes. We propose an automatic text lines detection method based on dynamic programming. We try to find the paths with the minimum cost between collections of text lines segments. Most steps of the proposed algorithm are based on off-line information. Hence it can also be applied to off-line documents after a few minor changes. The proposed methodology is tested on OHASD the first Arabic online sentence dataset. More than a hundred handwritten Arabic texts written by different writers are examined. Our experiments show better results than common on-line segmentation procedures..

Research paper thumbnail of Unconstrained Arabic Online Handwritten Words Segmentation using New HMM State Design

—In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An e... more —In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An essential problem addressed by analytical-based word recognition system. The system is composed of two-stages the first is a newly special designed hidden Markov model (HMM) and the second is a rules based stage. In our system, handwritten words are broken up into characters by simultaneous segmentation-recognition using HMMs of unique design trained using online features most of which are novel. The HMM output characters boundaries represent the proposed segmentation points (PSP) which are then validated by rules-based post stage without any contextual information help to solve different segmentation errors. The HMM has been designed and tested using a self collected dataset (OHASD) [1]. Most errors cases are cured and remarkable segmentation enhancement is achieved. Very promising word and character segmentation rates are obtained regarding the unconstrained Arabic handwriting difficulty and not using context help.

Research paper thumbnail of OHASD: The First On-Line Arabic Sentence Database Handwritten on Tablet PC

—In this paper we present the first Arabic sentence dataset for on-line handwriting recognition w... more —In this paper we present the first Arabic sentence dataset for on-line handwriting recognition written on tablet pc. The dataset is natural, simple and clear. Texts are sampled from daily newspapers. To collect naturally written handwriting, forms are dictated to writers. The current version of our dataset includes 154 paragraphs written by 48 writers. It contains more than 3800 words and more than 19,400 characters. Handwritten texts are mainly written by researchers from different research centers. In order to use this dataset in a recognition system word extraction is needed. In this paper a new word extraction technique based on the Arabic handwriting cursive nature is also presented. The technique is applied to this dataset and good results are obtained. The results can be considered as a bench mark for future research to be compared with.

Research paper thumbnail of Arabic Handwritten Script Recognition Towards Generalization: A Survey

Machine simulation of human reading has been the subject of intensive research for the last three... more Machine simulation of human reading has been the subject of intensive research for the last three decades. The interest devoted to this field is not explained only by the exciting challenges involved, but also the huge benefits that a system, designed in the context of a commercial application, could bring [1]. Creating a single general purpose cursive handwriting recognition device is a profoundly difficult challenge. The difficulties faced by the researchers in this field should give us added appreciation of the ability of humans to recognize rapidly and effectively the most complex and confusing handwritings. In this paper we studied the cursive handwriting recognition problem (especially Arabic script), we focused on nine sub-problems that represent the main challenges against researchers to develop a reliable, accurate, simple, general purpose Arabic handwritten script recognizer and we reviewed the last 10 years published contributions of researchers where we found these challenges appearing obviously in their work.

Research paper thumbnail of Arabic online word extraction from handwritten text using SVM-RBF classifiers decision fusion

In this paper, we propose a system for Arabic online word extraction from handwritten text lines,... more In this paper, we propose a system for Arabic online word extraction from handwritten text lines, a problem addressed for the first time for Arabic language as there is no public dataset of Arabic online handwritten texts available so far. We collected a dataset of unconstrained online handwritten sentences and used it to design and evaluate our system. First, our system classifies the white gaps between words connected components into either intra-word or inter-word gap according to some local and global online features extracted from each gap together with the groups of strokes encompassing the gap. The classifier is a polynomial kernel support vector machine (SVM) which decisions are used for initial word extraction. A post stage is added to the system to test the extracted words for under-segmentation and resolve this under-segmentation by reconsidering the gap type decisions for the stuck word. Classifiers decision fusion takes place by consulting five different classifiers (four SVM and a radial basis function neural network 'RBF NN') and feeding their decisions to a separate pre-trained SVM to make the final decision. Most stuck words are correctly detected and a lot of them have been correctly resolved. The post stage leads to remarkable error reduction compared to single classifiers performance. Promising results are achieved regarding the fact that the unconstrained Arabic handwriting nature adds more difficulties to the problem.

Research paper thumbnail of A Semi-Automatic Annotation Tool for Arabic Online Handwritten Text

Large amounts of ground truth data is vital for building, testing, analyzing and improving the pe... more Large amounts of ground truth data is vital for building, testing, analyzing and improving the performance of character recognizers especially those using segmentation based routines. Ground truth information, the annotation, can be associated with the document images at the paragraph level, the sentence level, the word level, and up until the character or stroke level. Providing huge annotated datasets for this purpose manually is a very taxing and error prone procedure. Therefore, it is important to complement the automatic tools for metadata extraction with tools that provide an efficient human-computer interface to experts for validation and correction to simplify the creation of recognizers. In this paper we present the first semi-automatic tool for annotation Arabic online handwritten documents. A tool provided to automate and simplify document visualization, manipulation and annotation of documents at the character level generating transcription files ready for use by any handwriting recognizer. The tool is a set of interactive user interfaces guiding the user along the whole process and reducing the human effort and time by the activation of smart segmentation utilities offering satisfying performance and allowing intervention for validation.

Research paper thumbnail of Simultaneous Segmentation and Recognition of Arabic Characters in an Unconstrained On-Line Cursive Handwritten Document

—The last two decades witnessed some advances in the development of an Arabic character recogniti... more —The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing on-line Arabic cursive handwriting. Rule-based methods are used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. A new technique for text line separation is also used.

Research paper thumbnail of Generative adversarial networks for handwriting image generation: a review

The Visual Computer, 2024

Handwriting synthesis, the task of automatically generating realistic images of handwritten text,... more Handwriting synthesis, the task of automatically generating realistic images of handwritten text, has gained increasing attention in recent years, both as a challenge in itself, as well as a task that supports handwriting recognition research. The latter task is to synthesize large image datasets that can then be used to train deep learning models to recognize handwritten text without the need for human-provided annotations. While early attempts at developing handwriting generators yielded limited results [1], more recent works involving generative models of deep neural network architectures have been shown able to produce realistic imitations of human handwriting [2-19]. In this review, we focus on one of the most prevalent and successful architectures in the field of handwriting synthesis, the generative adversarial network (GAN). We describe the capabilities, architecture specifics, and performance of the GAN-based models that have been introduced to the literature since 2019 [2-14]. These models can generate random handwriting styles, imitate reference styles, and produce realistic images of arbitrary text that was not in the training lexicon. The generated images have been shown to contribute to improving handwriting recognition results when augmenting the training samples of recognition models with synthetic images. The synthetic images were often hard to expose as non-real, even by human examiners, but also could be implausible or style-limited. The review includes a discussion of the characteristics of the GAN architecture in comparison with other paradigms in the image-generation domain and highlights the remaining challenges for handwriting synthesis.

Research paper thumbnail of Extracting text from scanned Arabic books: A a large-scale benchmark dataset and a fine-tuned Faster-R-CNN model ArticleCategory : Original Paper

International Journal of Document analysis and recognition, 2021

Datasets of documents in Arabic are urgently needed to promote computer vision and natural langua... more Datasets of documents in Arabic are urgently needed to promote computer vision and natural language processing research that addresses the specifics of the language. Unfortunately, publicly-available Arabic datasets are limited in size and restricted to certain document domains. This paper presents the release of BE-Arabic-9K, a dataset of more than 9,000 9000 high-quality scanned images from over 700 Arabic books. Among these, 1,500 1500 images have been manually segmented into regions and labeled by their functionality. BE-Arabic-9K includes book pages with a wide variety of complex layouts and page contents, making it suitable for various document layout analysis and text recognition research tasks. The paper also presents a page layout segmentation and text extraction baseline model based on fine-tuned Faster R-CNN structure (FFRA). This baseline model yields to cross-validation results with an average accuracy of 99.4% and F1 score of 99.1% for text versus non-text block classification on 1,500 1500 annotated images of BE-Arabic-9K. These results are remarkably better than those of the state-of-the-art Arabic book page segmentation system ECDP. FFRA also outperforms three other prior systems when tested on a competition benchmark dataset, making it an outstanding baseline model to challenge.

Research paper thumbnail of BCE-Arabic-v1 dataset

Proceedings of the 9th ACM International Conference on PErvasive Technologies Related to Assistive Environments - PETRA '16, 2016

Research paper thumbnail of The state of the art in handwriting synthesis

Cursive handwriting is a complex graphic realization of natural human communication. Its producti... more Cursive handwriting is a complex graphic realization of natural human communication. Its production and recognition involve a large number of highly cognitive functions including vision, motor control, and natural language understanding. Handwriting synthesis has many important applications to facilitate user's work and personalize the communication on pen-based devices. The problem of handwriting synthesis is not new and a number of studies have been published in the literature. Some approaches, Movement-simulation techniques, make a real attempt at modeling the process of handwriting production. Other approaches, Shape-simulation techniques, which usually record the glyphs directly and reuse or sample the glyphs when synthesis. Different challenges are holding back the progress of such type of research. In this paper we present a literature review about the recent trends in handwriting synthesis, highlighting the different generation processes and pointing out the challenges facing the researchers. Finally we are giving a conclusion about the scientific research collections presented, and summarizing our opinions to help move future work up to maturity.

Research paper thumbnail of Multi-Label and Multilingual News Framing Analysis

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Research paper thumbnail of Text and metadata extraction from scanned Arabic documents using support vector machines

Journal of information science (JIS), SAGE pub, 2020

Text information in scanned documents becomes accessible only when extracted and interpreted by a... more Text information in scanned documents becomes accessible only when extracted and interpreted by a text recognizer. For a recognizer to work successfully, it must have detailed location information about the regions of the document images that it is asked to analyse. It will need focus on page regions with text skipping non-text regions that include illustrations or photographs. However, text recogni-zers do not work as logical analyzers. Logical layout analysis automatically determines the function of a document text region, that is, it labels each region as a title, paragraph, or caption, and so on, and thus is an essential part of a document understanding system. In the past, rule-based algorithms have been used to conduct logical layout analysis, using limited size data sets. We here instead focus on supervised learning methods for logical layout analysis. We describe LABA, a system based on multiple support vector machines to perform logical Layout Analysis of scanned Books pages in Arabic. The system detects the function of a text region based on the analysis of various images features and a voting mechanism. For a baseline comparison, we implemented an older but state-of-the-art neural network method. We evaluated LABA using a data set of scanned pages from illustrated Arabic books and obtained high recall and precision values. We also found that the F-measure of LABA is higher for five of the tested six classes compared to the state-of-the-art method.

Research paper thumbnail of Making scanned Arabic documents machine accessible using an ensemble of SVM classifiers

International Journal on Document Analysis and Recognition (IJDAR), 2018

Raster-image PDF files originating from scanning or photographing paper documents are inaccessibl... more Raster-image PDF files originating from scanning or photographing paper documents are inaccessible to both text search engines and screen readers that people with visual impairments use. We here focus on the relatively less-researched problem of converting raster-image files with Arabic script into machine-accessible documents. Our method, called ECDP for “Ensemble-based classification of document patches,” segments the physical layout of the document, classifies image patches as containing text or graphics, assembles homogeneous document regions, and passes the text to an optical character recognition engine to convert into natural language. Classification is based on the majority voting of an ensemble of support vector machines. When tested on the dataset BCE-Arabic [Saad et al. in: ACM 9th annual international conference on pervasive technologies related to assistive environments (PETRA’16), Corfu, 2016], ECDP yielded an average patch classification accuracy of 97.3% and average F1 score of 95.26% for text patches and efficiently extracted text zones in both paragraphs and text-embedded graphics, even if the text is rotated by 90∘ or is in English. ECDP outperforms a classical layout analysis method (RLSA) and a state-of-the-art commercial product (RDI-CleverPage) on this dataset and maintains a relatively high level of performance on document images drawn from two other datasets (Hesham et al. in Pattern Anal Appl 20:1275–1287, 2017; Proprietary Dataset of 109 Arabic Documents. http://www.rdi-eg.com). The results suggest that the proposed method has the potential to generalize well to the analysis of documents with a broad range of content.

Research paper thumbnail of The ASAR 2018 Competition on Physical Layout Analysis of Scanned Arabic Books (PLA-SAB 2018)

Successful physical layout analysis (PLA) is a key factor in the performance of text recognizers ... more Successful physical layout analysis (PLA) is a key factor in the performance of text recognizers and many other applications. PLA solutions for scanned Arabic documents are few and difficult to compare due to differences in methods, data, and evaluation metrics. To help evaluate the performance of recent Arabic PLA solutions, the ASAR 2018 Competition on Physical Layout Analysis (PLA) was organized. This paper presents the results of this competition. The competition
focused on analyzing layouts for Arabic scanned book pages (SAB). PLA-SAB required solutions of two tasks: page-to-block segmentation and block text/non-text classification. In this paper we briefly describe the methods provided by participating teams, present their results for both tasks using the BCEArabic benchmarking dataset [1], and make an open call for continuous participation outside the context of ASAR 2018.

Research paper thumbnail of ASAR 2018 Layout Analysis Challenge: Using Random Forests to Analyze Scanned Arabic Books

Physical Layout Analysis (PLA) is a necessary step to recognize the contents of a digital documen... more Physical Layout Analysis (PLA) is a necessary step to recognize the contents of a digital document. PLA includes segmenting the document image and identifying the content type of the segments. PLA for digitized Arabic documents is challenging due to the nature of the Arabic script. In this paper, we introduce a PLA system for Arabic documents that were digitized by scanning. Our system RFAAD, short for ”Random Forests for Analyzing Arabic Documents,” starts with morphological preprocessing of the digitized hard copy and then extracts geometrical, shape, and context features to identify the connected components (CC) of the digital image as containing text or non-text. Random forests are trained using the first dataset release of a large data collection project,
BCE-Arabic-v1 [22]. Our system shows strong performance on BCE data in terms of CC classification accuracy and F1-score (97.5% and 97.7% respectively). When evaluated on datasets by other researchers [2], [11], RFAAD also performs well. Moreover, RFAAD shows moderately strong performance when applied to the most challenging layouts of the benchmarking dataset of the ASAR 2018 competition PLA-SAB.1 The
performance of RFAAD suggests that our work, with some modifications, has the potential to solve other open problems in the document analysis area and attain a relatively high degree of generalization.

Research paper thumbnail of LABA: Logical Layout Analysis of Book Page Images in Arabic Using Multiple Support Vector Machines

Logical layout analysis, which determines the function of a document region, for example, whether... more Logical layout analysis, which determines the function of a document region, for example, whether it is a title, paragraph, or caption, is an indispensable part in a document understanding system. Rule-based algorithms have long been used for such systems. The datasets available have been small, and so the generalization of the performance of these systems is difficult to assess. In this paper, we present LABA, a supervised machine learning system based on multiple support vector
machines for conducting a logical Layout Analysis of scanned pages of Books in Arabic. Our system labels the function (class) of a document(scanned book pages) region, based on its position on the page and other features. We evaluated LABA with the benchmark ”BCE-Arabic-v1” dataset, which contains scanned pages of illustrated Arabic books. We obtained high recall and precision values, and found that the F-measure of LABA is higher for all classes except the ”noise” class compared to a neural network method that was based on prior work.

Research paper thumbnail of Rule-Based Algorithms for Handwritten Character Recognition

Research paper thumbnail of BCE-Arabic-v1 dataset: Towards interpreting Arabic document images for people with visual impairments

Millions of individuals in the Arab world have signifi cant visual impairments that make it diffi... more Millions of individuals in the Arab world have signifi cant visual
impairments that make it difficult for them to access
printed text. Assistive technologies such as scanners and
screen readers often fail to turn text into speech because
optical character recognition software (OCR) has difficulty
to interpret the textual content of Arabic documents. In
this paper, we show that the inaccessibility of scanned PDF
documents is in large part due to the failure of the OCR engine
to understand the layout of an Arabic document. Arabic
document layout analysis (DLA) is therefore an urgent
research topic, motivated by the goal to provide assistive
technology that serves people with visual impairments. We
announce the launching of a large annotated dataset of Arabic
document images, called BCE-Arabic-v1, to be used as
a benchmark for DLA, OCR and text-to-speech research.
Our dataset contains 1,833 images of pages scanned from
180 books and represents a variety of page content and layout,
in particular, Arabic text in various fonts and sizes,
photographs, tables, diagrams, and charts in single or multiple
columns. We report the results of a formative study
that investigated the performance of state-of-the-art document
annotation tools. We found signifi cant differences and
limitations in the functionality and labeling speed of these
tools, and selected the best-performing tool for annotating
our benchmark BCE-Arabic-v1.

Research paper thumbnail of The state of the art in handwriting synthesis The state of the art in handwriting synthesis

Cursive handwriting is a complex graphic realization of natural human communication. Its producti... more Cursive handwriting is a complex graphic realization of natural human communication. Its production and recognition involve a large number of highly cognitive functions including vision, motor control, and natural language understanding. Handwriting synthesis has many important applications to facilitate user's work and personalize the communication on pen-based devices. The problem of handwriting synthesis is not new and a number of studies have been published in the literature. Some approaches, Movement-simulation techniques, make a real attempt at modeling the process of handwriting production. Other approaches, Shape-simulation techniques, which usually record the glyphs directly and reuse or sample the glyphs when synthesis. Different challenges are holding back the progress of such type of research. In this paper we present a literature review about the recent trends in handwriting synthesis, highlighting the different generation processes and pointing out the challenges facing the researchers. Finally we are giving a conclusion about the scientific research collections presented, and summarizing our opinions to help move future work up to maturity.

Research paper thumbnail of On-Line Arabic Handwriting Text Line Detection Using Dynamic Programming

—Text line detection in unconstrained handwritten documents remains an open document analysis pro... more —Text line detection in unconstrained handwritten documents remains an open document analysis problem. The typical method based on horizontal projection analysis and regrouping connected components is not usually successful because of baseline undulations and shifts, baseline-skew variability and inter-line distance variability. This work deals with on-line Arabic handwritten documents segmentation such as pages of handwritten notes. We propose an automatic text lines detection method based on dynamic programming. We try to find the paths with the minimum cost between collections of text lines segments. Most steps of the proposed algorithm are based on off-line information. Hence it can also be applied to off-line documents after a few minor changes. The proposed methodology is tested on OHASD the first Arabic online sentence dataset. More than a hundred handwritten Arabic texts written by different writers are examined. Our experiments show better results than common on-line segmentation procedures..

Research paper thumbnail of Unconstrained Arabic Online Handwritten Words Segmentation using New HMM State Design

—In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An e... more —In this paper we propose a segmentation system for unconstrained Arabic online handwriting. An essential problem addressed by analytical-based word recognition system. The system is composed of two-stages the first is a newly special designed hidden Markov model (HMM) and the second is a rules based stage. In our system, handwritten words are broken up into characters by simultaneous segmentation-recognition using HMMs of unique design trained using online features most of which are novel. The HMM output characters boundaries represent the proposed segmentation points (PSP) which are then validated by rules-based post stage without any contextual information help to solve different segmentation errors. The HMM has been designed and tested using a self collected dataset (OHASD) [1]. Most errors cases are cured and remarkable segmentation enhancement is achieved. Very promising word and character segmentation rates are obtained regarding the unconstrained Arabic handwriting difficulty and not using context help.

Research paper thumbnail of OHASD: The First On-Line Arabic Sentence Database Handwritten on Tablet PC

—In this paper we present the first Arabic sentence dataset for on-line handwriting recognition w... more —In this paper we present the first Arabic sentence dataset for on-line handwriting recognition written on tablet pc. The dataset is natural, simple and clear. Texts are sampled from daily newspapers. To collect naturally written handwriting, forms are dictated to writers. The current version of our dataset includes 154 paragraphs written by 48 writers. It contains more than 3800 words and more than 19,400 characters. Handwritten texts are mainly written by researchers from different research centers. In order to use this dataset in a recognition system word extraction is needed. In this paper a new word extraction technique based on the Arabic handwriting cursive nature is also presented. The technique is applied to this dataset and good results are obtained. The results can be considered as a bench mark for future research to be compared with.

Research paper thumbnail of Arabic Handwritten Script Recognition Towards Generalization: A Survey

Machine simulation of human reading has been the subject of intensive research for the last three... more Machine simulation of human reading has been the subject of intensive research for the last three decades. The interest devoted to this field is not explained only by the exciting challenges involved, but also the huge benefits that a system, designed in the context of a commercial application, could bring [1]. Creating a single general purpose cursive handwriting recognition device is a profoundly difficult challenge. The difficulties faced by the researchers in this field should give us added appreciation of the ability of humans to recognize rapidly and effectively the most complex and confusing handwritings. In this paper we studied the cursive handwriting recognition problem (especially Arabic script), we focused on nine sub-problems that represent the main challenges against researchers to develop a reliable, accurate, simple, general purpose Arabic handwritten script recognizer and we reviewed the last 10 years published contributions of researchers where we found these challenges appearing obviously in their work.

Research paper thumbnail of Arabic online word extraction from handwritten text using SVM-RBF classifiers decision fusion

In this paper, we propose a system for Arabic online word extraction from handwritten text lines,... more In this paper, we propose a system for Arabic online word extraction from handwritten text lines, a problem addressed for the first time for Arabic language as there is no public dataset of Arabic online handwritten texts available so far. We collected a dataset of unconstrained online handwritten sentences and used it to design and evaluate our system. First, our system classifies the white gaps between words connected components into either intra-word or inter-word gap according to some local and global online features extracted from each gap together with the groups of strokes encompassing the gap. The classifier is a polynomial kernel support vector machine (SVM) which decisions are used for initial word extraction. A post stage is added to the system to test the extracted words for under-segmentation and resolve this under-segmentation by reconsidering the gap type decisions for the stuck word. Classifiers decision fusion takes place by consulting five different classifiers (four SVM and a radial basis function neural network 'RBF NN') and feeding their decisions to a separate pre-trained SVM to make the final decision. Most stuck words are correctly detected and a lot of them have been correctly resolved. The post stage leads to remarkable error reduction compared to single classifiers performance. Promising results are achieved regarding the fact that the unconstrained Arabic handwriting nature adds more difficulties to the problem.

Research paper thumbnail of A Semi-Automatic Annotation Tool for Arabic Online Handwritten Text

Large amounts of ground truth data is vital for building, testing, analyzing and improving the pe... more Large amounts of ground truth data is vital for building, testing, analyzing and improving the performance of character recognizers especially those using segmentation based routines. Ground truth information, the annotation, can be associated with the document images at the paragraph level, the sentence level, the word level, and up until the character or stroke level. Providing huge annotated datasets for this purpose manually is a very taxing and error prone procedure. Therefore, it is important to complement the automatic tools for metadata extraction with tools that provide an efficient human-computer interface to experts for validation and correction to simplify the creation of recognizers. In this paper we present the first semi-automatic tool for annotation Arabic online handwritten documents. A tool provided to automate and simplify document visualization, manipulation and annotation of documents at the character level generating transcription files ready for use by any handwriting recognizer. The tool is a set of interactive user interfaces guiding the user along the whole process and reducing the human effort and time by the activation of smart segmentation utilities offering satisfying performance and allowing intervention for validation.

Research paper thumbnail of Simultaneous Segmentation and Recognition of Arabic Characters in an Unconstrained On-Line Cursive Handwritten Document

—The last two decades witnessed some advances in the development of an Arabic character recogniti... more —The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing on-line Arabic cursive handwriting. Rule-based methods are used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. A new technique for text line separation is also used.