Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network (original) (raw)

Recognition of Urdu Handwritten Characters Using Convolutional Neural Network

Applied Sciences

In the area of pattern recognition and pattern matching, the methods based on deep learning models have recently attracted several researchers by achieving magnificent performance. In this paper, we propose the use of the convolutional neural network to recognize the multifont offline Urdu handwritten characters in an unconstrained environment. We also propose a novel dataset of Urdu handwritten characters since there is no publicly-available dataset of this kind. A series of experiments are performed on our proposed dataset. The accuracy achieved for character recognition is among the best while comparing with the ones reported in the literature for the same task.

Recognition and Classification of Handwritten Urdu Numerals Using Deep Learning Techniques

Applied Sciences

Urdu is a complex language as it is an amalgam of many South Asian and East Asian languages; hence, its character recognition is a huge and difficult task. It is a bidirectional language with its numerals written from left to right while script is written in opposite direction which induces complexities in the recognition process. This paper presents the recognition and classification of a novel Urdu numeral dataset using convolutional neural network (CNN) and its variants. We propose custom CNN model to extract features which are used by Softmax activation function and support vector machine (SVM) classifier. We compare it with GoogLeNet and the residual network (ResNet) in terms of performance. Our proposed CNN gives an accuracy of 98.41% with the Softmax classifier and 99.0% with the SVM classifier. For GoogLeNet, we achieve an accuracy of 95.61% and 96.4% on ResNet. Moreover, we develop datasets for handwritten Urdu numbers and numbers of Pakistani currency to incorporate real-l...

Handwritten Urdu character recognition via images using different machine learning and deep learning techniques

Indian journal of science and technology, 2020

Objectives: This research presents a model for Urdu Handwritten Character Recognition via images using various Machine Learning and Deep Learning Techniques. The main objective of this research is to provide comparative study on Urdu Handwritten Characters from images dataset. Methods/Statistical analysis: In this research paper, Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) algorithm, Multi-Layer Perceptron (MLP), Concurrent Neural Network (CNN), Recurrent Neural Network (RNN) and Random Forest Algorithm (RF) have been implemented in order to evaluate most suitable technique for Urdu Handwritten Characters Recognition via images. Findings: Ample amount of research work has been carried out on English Language but it is clearly shown through the conducted literature review that very lesser amount of work has been done on Urdu Handwritten Characters Recognition using images. Furthermore, It has been analyzed from this research that CNN models are most efficient compared to ...

Recognition of Pashto Handwritten Characters Based on Deep Learning

Sensors, 2020

Handwritten character recognition is increasingly important in a variety of automation fields, for example, authentication of bank signatures, identification of ZIP codes on letter addresses, and forensic evidence. Despite improved object recognition technologies, Pashto’s hand-written character recognition (PHCR) remains largely unsolved due to the presence of many enigmatic hand-written characters, enormously cursive Pashto characters, and lack of research attention. We propose a convolutional neural network (CNN) model for recognition of Pashto hand-written characters for the first time in an unrestricted environment. Firstly, a novel Pashto handwritten character data set, “Poha”, for 44 characters is constructed. For preprocessing, deep fusion image processing techniques and noise reduction for text optimization are applied. A CNN model optimized in the number of convolutional layers and their parameters outperformed common deep models in terms of accuracy. Moreover, a set of be...

A Novel Deep Convolutional Neural Network Architecture Based on Transfer Learning for Handwritten Urdu Character Recognition

Tehnicki vjesnik - Technical Gazette, 2020

Deep convolutional neural networks (CNN) have made a huge impact on computer vision and set the state-of-the-art in providing extremely definite classification results. For character recognition, where the training images are usually inadequate, mostly transfer learning of pre-trained CNN is often utilized. In this paper, we propose a novel deep convolutional neural network for handwritten Urdu character recognition by transfer learning three pre-trained CNN models. We fine-tuned the layers of these pre-trained CNNs so as to extract features considering both global and local details of the Urdu character structure. The extracted features from the three CNN models are concatenated to train with two fully connected layers for classification. The experiment is conducted on UNHD, EMILLE, DBAHCL, and CDB/Farsi dataset, and we achieve 97.18% average recognition accuracy which outperforms the individual CNNs and numerous conventional classification methods.

Evaluation of deep learning approaches for optical character recognition in Urdu language

Mehran University Research Journal of Engineering and Technology

With the evolving technological era, the optical character recognition systems have substantial execution, considering the widespread use of daily hand-written human transaction. Optical Character Recognition (OCR) is an implementation of Computer Vision that digitizes numerous hand dealt documents for further analysis and formatting. OCR is achieved by various ways such as discriminative analysis and deep learning. This paper focuses on evaluating deep learning models on a hand-written compiled dataset of Urdu digits. The evaluation is performed for deep convolutional learning algorithms; VGGNet16, InceptionV3, ResNet50, and DenseNet121. The convolutional models are pre-trained on the ImageNet. The model weights of fully connected layers have been evaluated, reducing the training time of the convolutional layers. The testing accuracy for the compiled dataset is observed as, ResNet50 with 96%, InceptionV3 with 95%, VGGNet16 with 95% and DenseNet121 with 94%.

Handwritten Urdu Characters and Digits Recognition Using Transfer Learning and Augmentation With AlexNet

IEEE Access

Automated recognition of handwritten characters and digits is a challenging task. Although a significant amount of literature exists for automatic recognition of handwritten characters of English and other major languages in the world, there exists a wide research gap due to lack of research for recognition of Urdu language. The variations in writing style, shape and size of individual characters and similarities with other characters add to the complexity for accurate classification of handwritten characters. Deep neural networks have emerged as a powerful technology for automated classification of character patters and object images. Although deep networks are known to provide remarkable results on large-scale datasets with millions of images, however the use of deep networks for small image datasets is still challenging. The purpose of this research is to present a classification framework for automatic recognition of handwritten Urdu character and digits with higher recognition accuracy by utilizing theory of transfer learning and pretrained Convolution Neural Networks (CNN). The performance of transfer learning is evaluated in different ways: by using pre-trained AlexNet CNN model with Support Vector Machine (SVM) classifier, and finetuned AlexNet for extracting features and classification. We have fine-tuned AlexNet hyper-parameters to achieve higher accuracy and data augmentation is performed to avoid over-fitting. Experimental results and the quantitative comparisons demonstrate the effectiveness of the proposed research for recognition of handwritten characters and digits using fine-tuned AlexNet. The proposed research based on fine-tuned AlexNet outperforms the related state-of-the-art research thereby achieving a classification accuracy of 97.08%, 98.21%, 94.92% for urdu characters, digits and hybrid datasets respectively. The presented methods can be applied for research on Urdu characters and in diverse domains such as handwritten text image retrieval, reading postal addresses, bank's cheque processing, preserving and digitization of manuscripts from old ages. INDEX TERMS Automated recognition, urdu HCR systems, CNN, transfer learning, alexnet, SVM, optical character recognition. I. INTRODUCTION 24 Digital image processing plays a vital role in different com-25 puter vision based applications such as image retrieval, 26 medical image analysis, face recognition, decision support 27 The associate editor coordinating the review of this manuscript and approving it for publication was Jeon Gwanggil. systems with industrial applications, object recognition and 28 image annotation [1], [2], [3], [4], [5], [6]. The recent massive 29 growth in the application of mobile and computing devices, 30 has increased the implications of Character Recognition 31 (CR) [7]. Recognition of handwritten text is problematic 32 because of the fact that writing styles differ from individual 33 to individual.

Urdu Handwritten Characters Data Visualization and Recognition Using Distributed Stochastic Neighborhood Embedding and Deep Network

Complexity

In this paper, we make use of the 2-dimensional data obtained through t-Stochastic Neighborhood Embedding (t-SNE) when applied on high-dimensional data of Urdu handwritten characters and numerals. The instances of the dataset used for experimental work are classified in multiple classes depending on the shape similarity. We performed three tasks in a disciplined order; namely, (i) we generated a state-of-the-art dataset of both the Urdu handwritten characters and numerals by inviting a number of native Urdu participants from different social and academic groups, since there is no publicly available dataset of such type till date, then (ii) applied classical approaches of dimensionality reduction and data visualization like Principal Component Analysis (PCA), Autoencoders (AE) in comparison with t-Stochastic Neighborhood Embedding (t-SNE), and (iii) used the reduced dimensions obtained through PCA, AE, and t-SNE for recognition of Urdu handwritten characters and numerals using a deep...

ARABIC HANDWRITTEN CHARACTER RECOGNITION BASED ON DEEP CONVOLUTIONAL NEURAL NETWORKS

The automatic analysis and recognition of offline Arabic handwritten characters from images is an important problem in many applications. Even with the great progress of recent research in optical character recognition, a few problems still wait to be solved, especially for Arabic characters. The emergence of Deep Neural Networks promises a strong solution to some of these problems. We present a deep neural network for the handwritten Arabic character recognition problem that uses convolutional neural network (CNN) models with regularization parameters such as batch normalization to prevent overfitting. We applied the Deep CNN for the AIA9k and the AHCD databases and the classification accuracies for the two datasets were 94.8% and 97.6%, respectively. A study of the network performance on the EMNIST and a form-based AHCD dataset were performed to aid in the analysis.

Bangla Handwritten Digit Recognition Using Autoencoder and Deep Convolutional Neural Network

—Handwritten digit recognition is a typical image classification problem. Convolutional neural networks, also known as ConvNets, are powerful classification models for such tasks. As different languages have different styles and shapes of their numeral digits, accuracy rates of the models vary from each other and from language to language. However, unsupervised pre-training in such situation has shown improved accuracy for classification tasks, though no such work has been found for Bangla digit recognition. This paper presents the use of unsupervised pre-training using autoencoder with deep ConvNet in order to recognize handwritten Bangla digits, i.e., 0-9. The datasets that are used in this paper are CMATERDB 3.1.1 and a dataset published by the Indian Statistical Institute (ISI). This paper studies four different combinations of these two datasets-two experiments are done against their own training and testing images, other two experiments are done cross validating the datasets. In one of these four experiments, the proposed approach achieves 99.50% accuracy, which is so far the best for recognizing handwritten Bangla digits. The ConvNet model is trained with 19,313 images of ISI handwritten character dataset and tested with images of CMATERDB dataset.