Writer Identification Using TF-IDF for Cursive Handwritten Word Recognition (original) (raw)
Related papers
Writer Identification Using Handwritten Cursive Texts and Single Character Words
Electronics
One of the biometric methods in authentication systems is the writer verification/identification using password handwriting. The main objective of this paper is to present a robust writer verification system by using cursive texts as well as block letter words. To evaluate the system, two datasets have been used. One of them is called Secure Password DB 150, which is composed of 150 users with 18 samples of single character words per user. Another dataset is public and called IAM online handwriting database, and it is composed of 220 users of cursive text samples. Each sample has been defined by a set of features, composed of 67 geometrical, statistical, and temporal features. In order to get more discriminative information, two feature reduction methods have been applied, Fisher Score and Info Gain Attribute Evaluation. Finally, the classification system has been implemented by hold-out cross validation and k-folds cross validation strategies for three different classifiers, K-NN, ...
An interactive Tool for Writer Identification based on Offline Text Dependent Approach
International Journal of Advanced Research in Artificial Intelligence, 2013
Writer identification is the process of identifying the writer of the document based on their handwriting. The growth of computational engineering, artificial intelligence and pattern recognition fields owes greatly to one of the highly challenged problem of handwriting identification. This paper proposes the computational intelligence technique to develop discriminative model for writer identification based on handwritten documents. Scanned images of handwritten documents are segmented into words and these words are further segmented into characters for word level and character level writer identification. A set of features are extracted from the segmented words and characters. Feature vectors are trained using support vector machine and obtained 94.27% accuracy for word level, 90.10% for character level. An interactive tool has been developed based on the word level writer identification model.
Writer Identification Using Edge-Based Directional Features
Document Analysis and …, 2003
A system for writer identification based on Arabic handwritten words was built. First a database of words was gathered and used as a test base. Then, features vectors were extracted from writers' word images. Prior to feature extraction, normalization operations were applied to a word or text line. In this research, we studied the feature extraction and recognition operations on Arabic text, on the identification rate of writers. Since there is no well known database containing Arabic handwritten words for researchers to test, we built a new database of off-line Arabic handwriting text to be used for writer identification research. The proposed database is meant to provide training and testing sets for Arabic writer identification research. Arabic handwritten words were collected from 100 writers. We evaluated the performance of edge-based directional probability distributions as features and other features in Arabic writer identification.
Text independent offline hand writer recognition using machine learning
2017
Handwriting is a behavioural biometric that an individual learns and develops over time and automated writer identification systems can be developed by identifying these behavioural aspects of an individual’s writing style. These writer recognition systems greatly assist forensic experts by facilitating them with semi-automated tools that segment the text, narrow down the search, help with visualization and finally assist in the final identification of an unknown handwritten sample. Handwriting, as a behavioural characteristic, has been a subject of interest for researchers for many decades and intensive research performed in this field has resulted in the development of multiple methods and algorithms. However, automated writer identification is still a challenging problem. Difficulties in segmenting text and the deviation of an individual from his or her unique writing style is the reason for ongoing research in this field. This thesis aims to investigate the problems faced in aut...
Writer Identification Using Text Line Based Features
In this paper we present a system for writer identification. From handwritten lines of text, twelve features are extracted which are used to recognize persons, based on their handwriting. The features extracted mainly correspond to visible characteristics of the writing, for example, the width, the slant and the height of the three main writing zones. Additionally, features based on the fractal behavior of the writing, which are correlated with the writing's legibility, are used. With these features two classifiers are applied: U k-nearest neighbor and a feed forward neural network classijiel: I n the experiments, 100 pages of text written by 20 different writers are used. By classifying individual text lines, an average recognition rate of 87.8% for the k-nearest neighbor and 90.7% for the neural network is measured. By a simple maximum ranking over all lines of a page, all texts are correctly assigned to the corresponding writers. Compared to these results, an average recognition rate of 98% was measured when humans assigned persons to the text lines.
Writer identification approach based on bag of words with OBI features
Information Processing & Management
Handwriter identification aims to simplify the task of forensic experts by providing them with semiautomated tools in order to enable them to narrow down the search to determine the final identification of an unknown handwritten sample. An identification algorithm aims to produce a list of predicted writers of the unknown handwritten sample ranked in terms of confidence measure metrics for use by the forensic expert will make the final decision. Most existing handwriter identification systems use either statistical or model-based approaches. To further improve the performances this paper proposes to deploy a combination of both approaches using Oriented Basic Image features and the concept of graphemes codebook. To reduce the resulting high dimensionality of the feature vector a Kernel Principal Component Analysis has been used. To gauge the effectiveness of the proposed method a performance analysis, using IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting, has been carried out. The results obtained achieved an accuracy of 96% thus demonstrating its superiority when compared against similar techniques.
Novel geometric features for off-line writer identification
Pattern Analysis and Applications, 2014
Writer identification is an important field in forensic document examination. Typically, a writer identification system consists of two main steps: feature extraction and matching and the performance depends significantly on the feature extraction step. In this paper, we propose a set of novel geometrical features that are able to characterize different writers. These features include direction, curvature, and tortuosity. We also propose an improvement of the edge-based directional and chain codebased features. The proposed methods are applicable to Arabic and English handwriting. We have also studied several methods for computing the distance between feature vectors when comparing two writers. Evaluation of the methods is performed using both the IAM handwriting database and the QUWI database for each individual feature reaching Top1 identification rates of 82 and 87 % in those two datasets, respectively. The accuracies achieved by Kernel Discriminant Analysis (KDA) are significantly higher than those observed before feature-level writer identification was implemented. The results demonstrate the effectiveness of the improved versions of both chaincode features and edge-based directional features.
Offline Text-Independent Writer Identification Based on Scale Invariant Feature Transform
IEEE Transactions on Information Forensics and Security, 2014
In this paper, an efficient method for text-independent writer identification using a codebook method is proposed. The method uses the occurrence histogram of the shapes in a codebook to create a feature vector for each specific manuscript. For cursive handwritings, a wide variety of different shapes exist in the connected components obtained from the handwriting. Small fragments of connected components are used to avoid complex patterns. Two efficient methods for extracting codes from contours are introduced. One method uses the actual pixel coordinates of contour fragments while the other one uses a linear piece-wise approximation using segment angles and lengths. To evaluate the methods, writer identification is conducted on two English and three Farsi handwriting databases. Both methods show promising performances with the performance of second method being better than the first one.
A Set of Chain Code Based Features for Writer Recognition
2009 10th International Conference on Document Analysis and Recognition, 2009
This communication presents an effective method for writer recognition in handwritten documents. We have introduced a set of features that are extracted from the contours of handwritten images at different observation levels. At the global level, we extract the histograms of the chain code, the first and second order differential chain codes and, the histogram of the curvature indices at each point of the contour of handwriting. At the local level, the handwritten text is divided into a large number of small adaptive windows and within each window the contribution of each of the eight directions (and their differentials) is counted in the corresponding histograms. Two writings are then compared by computing the distances between their respective histograms. The system trained and tested on two different data sets of 650 and 225 writers respectively, exhibited promising results on writer identification and verification.