Urdu Handwritten Characters Data Visualization and Recognition Using Distributed Stochastic Neighborhood Embedding and Deep Network (original) (raw)

Pioneer dataset and automatic recognition of Urdu handwritten characters using a deep autoencoder and convolutional neural network

SN Applied Sciences, 2020

Automatic recognition of Urdu handwritten digits and characters, is a challenging task. It has applications in postal address reading, bank's cheque processing, and digitization and preservation of handwritten manuscripts from old ages. While there exists a significant work for automatic recognition of handwritten English characters and other major languages of the world, the work done for Urdu language is extremely insufficient. This paper has two goals. Firstly, we introduce a pioneer dataset for handwritten digits and characters of Urdu, containing samples from more than 900 individuals. Secondly, we report results for automatic recognition of handwritten digits and characters as achieved by using deep auto-encoder network and convolutional neural network. More specifically, we use a two-layer and a three-layer deep autoencoder network and convolutional neural network and evaluate the two frameworks in terms of recognition accuracy. The proposed framework of deep autoencoder can successfully recognize digits and characters with an accuracy of 97% for digits only, 81% for characters only and 82% for both digits and characters simultaneously. In comparison, the framework of convolutional neural network has accuracy of 96.7% for digits only, 86.5% for characters only and 82.7% for both digits and characters simultaneously. These frameworks can serve as baselines for future research on Urdu handwritten text.

Recognition and Classification of Handwritten Urdu Numerals Using Deep Learning Techniques

Applied Sciences

Urdu is a complex language as it is an amalgam of many South Asian and East Asian languages; hence, its character recognition is a huge and difficult task. It is a bidirectional language with its numerals written from left to right while script is written in opposite direction which induces complexities in the recognition process. This paper presents the recognition and classification of a novel Urdu numeral dataset using convolutional neural network (CNN) and its variants. We propose custom CNN model to extract features which are used by Softmax activation function and support vector machine (SVM) classifier. We compare it with GoogLeNet and the residual network (ResNet) in terms of performance. Our proposed CNN gives an accuracy of 98.41% with the Softmax classifier and 99.0% with the SVM classifier. For GoogLeNet, we achieve an accuracy of 95.61% and 96.4% on ResNet. Moreover, we develop datasets for handwritten Urdu numbers and numbers of Pakistani currency to incorporate real-l...

Recognition of Urdu Handwritten Characters Using Convolutional Neural Network

Applied Sciences

In the area of pattern recognition and pattern matching, the methods based on deep learning models have recently attracted several researchers by achieving magnificent performance. In this paper, we propose the use of the convolutional neural network to recognize the multifont offline Urdu handwritten characters in an unconstrained environment. We also propose a novel dataset of Urdu handwritten characters since there is no publicly-available dataset of this kind. A series of experiments are performed on our proposed dataset. The accuracy achieved for character recognition is among the best while comparing with the ones reported in the literature for the same task.

Handwritten Urdu character recognition via images using different machine learning and deep learning techniques

Indian journal of science and technology, 2020

Objectives: This research presents a model for Urdu Handwritten Character Recognition via images using various Machine Learning and Deep Learning Techniques. The main objective of this research is to provide comparative study on Urdu Handwritten Characters from images dataset. Methods/Statistical analysis: In this research paper, Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) algorithm, Multi-Layer Perceptron (MLP), Concurrent Neural Network (CNN), Recurrent Neural Network (RNN) and Random Forest Algorithm (RF) have been implemented in order to evaluate most suitable technique for Urdu Handwritten Characters Recognition via images. Findings: Ample amount of research work has been carried out on English Language but it is clearly shown through the conducted literature review that very lesser amount of work has been done on Urdu Handwritten Characters Recognition using images. Furthermore, It has been analyzed from this research that CNN models are most efficient compared to ...

Recognition of Pashto Handwritten Characters Based on Deep Learning

Sensors, 2020

Handwritten character recognition is increasingly important in a variety of automation fields, for example, authentication of bank signatures, identification of ZIP codes on letter addresses, and forensic evidence. Despite improved object recognition technologies, Pashto’s hand-written character recognition (PHCR) remains largely unsolved due to the presence of many enigmatic hand-written characters, enormously cursive Pashto characters, and lack of research attention. We propose a convolutional neural network (CNN) model for recognition of Pashto hand-written characters for the first time in an unrestricted environment. Firstly, a novel Pashto handwritten character data set, “Poha”, for 44 characters is constructed. For preprocessing, deep fusion image processing techniques and noise reduction for text optimization are applied. A CNN model optimized in the number of convolutional layers and their parameters outperformed common deep models in terms of accuracy. Moreover, a set of be...

Evaluation of Handwritten Urdu Text by Integration of MNIST Dataset Learning Experience

IEEE Access

The similar nature of patterns may enhance the learning if the experience they attained during training is utilized to achieve maximum accuracy. This paper presents a novel way to exploit the transfer learning experience of similar patterns on handwritten Urdu text analysis. The MNIST pre-trained network is employed by transferring it's learning experience on Urdu Nastaliq Handwritten Dataset (UNHD) samples. The convolutional neural network is used for feature extraction. The experiments were performed using deep multidimensional long short term (MDLSTM) memory networks. The obtained result shows immaculate performance on number of experiments distinguished on the basis of handwritten complexity. The result of demonstrated experiments show that pre-trained network outperforms on subsequent target networks which enable them to focus on a particular feature learning. The conducted experiments presented astonishingly good accuracy on UNHD dataset.

Handwritten Urdu Characters and Digits Recognition Using Transfer Learning and Augmentation With AlexNet

IEEE Access

Automated recognition of handwritten characters and digits is a challenging task. Although a significant amount of literature exists for automatic recognition of handwritten characters of English and other major languages in the world, there exists a wide research gap due to lack of research for recognition of Urdu language. The variations in writing style, shape and size of individual characters and similarities with other characters add to the complexity for accurate classification of handwritten characters. Deep neural networks have emerged as a powerful technology for automated classification of character patters and object images. Although deep networks are known to provide remarkable results on large-scale datasets with millions of images, however the use of deep networks for small image datasets is still challenging. The purpose of this research is to present a classification framework for automatic recognition of handwritten Urdu character and digits with higher recognition accuracy by utilizing theory of transfer learning and pretrained Convolution Neural Networks (CNN). The performance of transfer learning is evaluated in different ways: by using pre-trained AlexNet CNN model with Support Vector Machine (SVM) classifier, and finetuned AlexNet for extracting features and classification. We have fine-tuned AlexNet hyper-parameters to achieve higher accuracy and data augmentation is performed to avoid over-fitting. Experimental results and the quantitative comparisons demonstrate the effectiveness of the proposed research for recognition of handwritten characters and digits using fine-tuned AlexNet. The proposed research based on fine-tuned AlexNet outperforms the related state-of-the-art research thereby achieving a classification accuracy of 97.08%, 98.21%, 94.92% for urdu characters, digits and hybrid datasets respectively. The presented methods can be applied for research on Urdu characters and in diverse domains such as handwritten text image retrieval, reading postal addresses, bank's cheque processing, preserving and digitization of manuscripts from old ages. INDEX TERMS Automated recognition, urdu HCR systems, CNN, transfer learning, alexnet, SVM, optical character recognition. I. INTRODUCTION 24 Digital image processing plays a vital role in different com-25 puter vision based applications such as image retrieval, 26 medical image analysis, face recognition, decision support 27 The associate editor coordinating the review of this manuscript and approving it for publication was Jeon Gwanggil. systems with industrial applications, object recognition and 28 image annotation [1], [2], [3], [4], [5], [6]. The recent massive 29 growth in the application of mobile and computing devices, 30 has increased the implications of Character Recognition 31 (CR) [7]. Recognition of handwritten text is problematic 32 because of the fact that writing styles differ from individual 33 to individual.

PHND: Pashtu Handwritten Numerals Database and deep learning benchmark

PLOS ONE, 2020

In this paper we introduce a real Pashtu handwritten numerals dataset (PHND) having 50,000 scanned images and make publicly available for research and scientific use. Although more than fifty million people in the world use this language for written and oral communication, no significant efforts are devoted to the Pashtu Optical Character Recognition (POCR). We present a new approach for Pahstu handwritten numerals recognition (PHNR) based on deep neural networks. We train Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) on high-frequency numerals for feature extraction and classification. We evaluated the performance of the proposed algorithm on the newly introduced Pashtu handwritten numerals database PHND and Bangla language number database CMATERDB 3.1.1. We obtained best recognition rate of 98.00% and 98.64% on PHND and CMATERDB 3.1.1. respectively.

Deep learning-based isolated handwritten Sindhi character recognition

Indian Journal of Science and Technology, 2020

Motivation : The problem of handwritten text recognition is vastly studied since last few decades. Many innovative ideas have been developed, where state-of-the-art accuracy is achieved for the English, Chinese or Indian scripts. The recent developments for the cursive scripts such as Arabic and Urdu handwritten text recognition have achieved remarkable accuracy. However, for the Sindhi script, existing systems have not shown significant results and the problem is still an open challenge. Several challenges such as variations in writing styles, joined text, ligature overlapping, and others associated to the handwritten Sindhi text make the problem more complex. Objectives: In this study, a deep residual network with shortcut connections and summation fusion method using convolutional neural network (CNN) is proposed for automatic feature extraction and classification of handwritten Sindhi characters. Method: To increase the powerful feature representation ability of the network, the features of the convolutional layers in the residual block are fused together and combined with the output of the previous residual block. The proposed network is trained on a custom developed handwritten Sindhi character dataset. To tackle the problem of small data, a data augmentation with rotation, flipping and image enhancement techniques have been used. Findings: The experimental results show that the proposed model outperforms than the best results previously published for the handwritten Sindhi character recognition. Novelty: This is the first research that proposes deep residual network with summation fusion for the Sindhi handwritten text recognition.

Devanagari Handwritten Character Recognition Using Deep Learning

International Journal for Research in Applied Science & Engineering Technology (IJRASET), 2022

In this paper, we present the implementation of Devanagari handwritten character recognition using deep learning. Hand written character recognition gaining more importance due to its major contribution in automation system. Devanagari script is one of various languages script in India. It consists of 12 vowels and 36 consonants. Here we implemented the deep learning model to recognize the characters. The character recognition mainly five steps: pre-processing, segmentation, feature extraction, prediction, post-processing. The model will use convolutional neural network to train the model and image processing techniques to use the character recognition and predict the accuracy of rcognition.