Umapada Pal - Academia.edu (original) (raw)

Papers by Umapada Pal

Research paper thumbnail of Writer Identification in Indic Scripts: A Stroke Distribution Based Approach

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), 2017

This paper proposes to represent an offline handwritten document with a distribution of strokes o... more This paper proposes to represent an offline handwritten document with a distribution of strokes over an alphabet of strokes for writer identification. A data driven approach for stroke alphabet creation is done as follows: strokes are extracted from the image, using a regression method, extracted strokes are represented as fixed length vectors in a vector space, strokes are clustered into stroke categories to create a stroke alphabet. The paper proposes a clustering method with a new clustering score whereby an optimal number of clusters (categories) are automatically identified. For a given document, based on the frequency of occurrence of elements in the stroke alphabet, a histogram is created that represents the writer's writing style. Support Vector Machine is used for the classification purpose. Offline handwritten documents written in two different Indic languages, viz., Telugu and Kannada, were considered for the experimentation. Results comparable to other methods in the literature are obtained from the proposed method.

Research paper thumbnail of HMM-based Indic handwritten word recognition using zone segmentation

Pattern Recognition, Dec 1, 2016

Research paper thumbnail of Handwriting Segmentation Contest

HAL (Le Centre pour la Communication Scientifique Directe), Aug 25, 2013

International audienceThis paper presents the results of the Handwriting Segmentation Contest tha... more International audienceThis paper presents the results of the Handwriting Segmentation Contest that was organized in the context of the ICDAR2013. The general objective of the contest was to use well established evaluation practices and procedures to record recent advances in off-line handwriting segmentation. Two benchmarking datasets, one for text line and one for word segmentation, were created in order to test and compare all submitted algorithms as well as some state-of-the-art methods for handwritten document image segmentation in realistic circumstances. Handwritten document images were produced by many writers in two Latin based languages (English and Greek) and in one Indian language (Bangla, the second most popular language in India). These images were manually annotated in order to produce the ground truth which corresponds to the correct text line and word segmentation results. The datasets of previously organized contests (ICDAR2007, ICDAR2009 and ICFHR2010 Handwriting Segmentation Contests) along with a dataset of Bangla document images were used as training dataset. Eleven methods are submitted in this competition. A brief description of the submitted algorithms, the evaluation criteria and the segmentation results obtained from the submitted methods are also provided in this manuscript

Research paper thumbnail of Automatic recognition of printed Oriya script

Sadhana-academy Proceedings in Engineering Sciences, Feb 1, 2002

Research paper thumbnail of Machine-printed and hand-written text lines identification

Pattern Recognition Letters, Mar 1, 2001

Research paper thumbnail of An improved document skew angle estimation technique

Pattern Recognition Letters, Jul 1, 1996

Research paper thumbnail of Multi-Oriented Text Lines Detection, Their Skew Estimation

Indian Conference on Computer Vision, Graphics and Image Processing, 2002

Research paper thumbnail of Bag-of-visual-words for signature-based multi-script document retrieval

Neural Computing and Applications, Mar 22, 2018

Research paper thumbnail of Document Image Retrieval Based on Visual Saliency Maps

2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)

Research paper thumbnail of A System for Recognition of Destination Address in Postal Documents of India

Malaysian Journal of Computer Science

Recognition of destination address is compulsory for automation of the postal system in India. Ou... more Recognition of destination address is compulsory for automation of the postal system in India. Our observation found that such recognition becomes a very challenging task due to inter-mixing of three languages (Hindi, English and the official language of the particular state in which the postal document is supposed to reach).In this paper, our attempt towards development of a dynamic programming based system for city-name and pin code recognition of destination address in postal documents of India not only managed to address the difficulties related to identification of the scripts but also managed to get rid of those problems which is generated due to character touching in postal documents. For city-name recognition, lexicon information is used. However, no lexicon information is used for pin code recognition since an Indian pin code contains only 6 digits. We obtained 99.55% reliability from tri-lingual city-name recognition system where error rates are 0.20% and rejection rates a...

Research paper thumbnail of Mining text from natural scene and video images: A survey

WIREs Data Mining and Knowledge Discovery, 2021

In computer terminology, mining is considered as extracting meaningful information or knowledge f... more In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.

Research paper thumbnail of Evaluation of Gist Operator for Document Image Retrieval

2018 13th IAPR International Workshop on Document Analysis Systems (DAS), 2018

Research paper thumbnail of An Online Learning-Based Adaptive Biometric System

Adaptive Biometric Systems, 2015

In the last decade, adaptive biometrics has become an emerging field of research. Considering the... more In the last decade, adaptive biometrics has become an emerging field of research. Considering the fact that limited work has been undertaken on adaptive biometrics using machine learning techniques, in this chapter we list and discuss a few out of many potential learning techniques that can be applied to build an adaptive biometric system. In order to illustrate the efficacy of one of the incremental learning techniques from the literature, we built an adaptive biometric system. For experimentation, we have used multi-modal ocular (sclera and iris) data. The preliminary results have been reported in the results section, which are very promising.

Research paper thumbnail of Fast local binary pattern: Application to document image retrieval

2017 International Conference on Image and Vision Computing New Zealand (IVCNZ), 2017

Research paper thumbnail of A New Method for Detecting Altered Text in Document Images

Pattern Recognition and Artificial Intelligence, 2020

As more and more office documents are captured, stored, and shared in digital format, and as imag... more As more and more office documents are captured, stored, and shared in digital format, and as image editing software are becoming increasingly more powerful, there is a growing concern about document authenticity. To prevent illicit activities, this paper presents a new method for detecting altered text in document images. The proposed method explores the relationship between positive and negative coefficients of DCT to extract the effect of distortions caused by tampering by fusing reconstructed images of respective positive and negative coefficients, which results in Positive-Negative DCT coefficients Fusion (PNDF). To take advantage of spatial information, we propose to fuse R, G, and B color channels of input images, which results in RGBF (RGB Fusion). Next, the same fusion operation is used for fusing PNDF and RGBF, which results in a fused image for the original input one. We compute a histogram to extract features from the fused image, which results in a feature vector. The feature vector is then fed to a deep neural network for classifying altered text images. The proposed method is tested on our own dataset and the standard datasets from the ICPR 2018 Fraud Contest, Altered Handwriting (AH), and faked IMEI number images. The results show that the proposed method is effective and the proposed method outperforms the existing methods irrespective of image type.

Research paper thumbnail of Zone-based keyword spotting in Bangla and Devanagari documents

Multimedia Tools and Applications, 2020

Research paper thumbnail of A comparative study of different texture features for document image retrieval

Expert Systems with Applications, 2018

Research paper thumbnail of An Efficient Signature Verification Method Based on an Interval Symbolic Representation and a Fuzzy Similarity Measure

IEEE Transactions on Information Forensics and Security, 2017

Research paper thumbnail of Document Image Retrieval Based on Texture Features: A Recognition-Free Approach

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2016

Research paper thumbnail of A brief review of document image retrieval methods: Recent advances

2016 International Joint Conference on Neural Networks (IJCNN), 2016

Research paper thumbnail of Writer Identification in Indic Scripts: A Stroke Distribution Based Approach

2017 4th IAPR Asian Conference on Pattern Recognition (ACPR), 2017

This paper proposes to represent an offline handwritten document with a distribution of strokes o... more This paper proposes to represent an offline handwritten document with a distribution of strokes over an alphabet of strokes for writer identification. A data driven approach for stroke alphabet creation is done as follows: strokes are extracted from the image, using a regression method, extracted strokes are represented as fixed length vectors in a vector space, strokes are clustered into stroke categories to create a stroke alphabet. The paper proposes a clustering method with a new clustering score whereby an optimal number of clusters (categories) are automatically identified. For a given document, based on the frequency of occurrence of elements in the stroke alphabet, a histogram is created that represents the writer's writing style. Support Vector Machine is used for the classification purpose. Offline handwritten documents written in two different Indic languages, viz., Telugu and Kannada, were considered for the experimentation. Results comparable to other methods in the literature are obtained from the proposed method.

Research paper thumbnail of HMM-based Indic handwritten word recognition using zone segmentation

Pattern Recognition, Dec 1, 2016

Research paper thumbnail of Handwriting Segmentation Contest

HAL (Le Centre pour la Communication Scientifique Directe), Aug 25, 2013

International audienceThis paper presents the results of the Handwriting Segmentation Contest tha... more International audienceThis paper presents the results of the Handwriting Segmentation Contest that was organized in the context of the ICDAR2013. The general objective of the contest was to use well established evaluation practices and procedures to record recent advances in off-line handwriting segmentation. Two benchmarking datasets, one for text line and one for word segmentation, were created in order to test and compare all submitted algorithms as well as some state-of-the-art methods for handwritten document image segmentation in realistic circumstances. Handwritten document images were produced by many writers in two Latin based languages (English and Greek) and in one Indian language (Bangla, the second most popular language in India). These images were manually annotated in order to produce the ground truth which corresponds to the correct text line and word segmentation results. The datasets of previously organized contests (ICDAR2007, ICDAR2009 and ICFHR2010 Handwriting Segmentation Contests) along with a dataset of Bangla document images were used as training dataset. Eleven methods are submitted in this competition. A brief description of the submitted algorithms, the evaluation criteria and the segmentation results obtained from the submitted methods are also provided in this manuscript

Research paper thumbnail of Automatic recognition of printed Oriya script

Sadhana-academy Proceedings in Engineering Sciences, Feb 1, 2002

Research paper thumbnail of Machine-printed and hand-written text lines identification

Pattern Recognition Letters, Mar 1, 2001

Research paper thumbnail of An improved document skew angle estimation technique

Pattern Recognition Letters, Jul 1, 1996

Research paper thumbnail of Multi-Oriented Text Lines Detection, Their Skew Estimation

Indian Conference on Computer Vision, Graphics and Image Processing, 2002

Research paper thumbnail of Bag-of-visual-words for signature-based multi-script document retrieval

Neural Computing and Applications, Mar 22, 2018

Research paper thumbnail of Document Image Retrieval Based on Visual Saliency Maps

2019 International Conference on Document Analysis and Recognition Workshops (ICDARW)

Research paper thumbnail of A System for Recognition of Destination Address in Postal Documents of India

Malaysian Journal of Computer Science

Recognition of destination address is compulsory for automation of the postal system in India. Ou... more Recognition of destination address is compulsory for automation of the postal system in India. Our observation found that such recognition becomes a very challenging task due to inter-mixing of three languages (Hindi, English and the official language of the particular state in which the postal document is supposed to reach).In this paper, our attempt towards development of a dynamic programming based system for city-name and pin code recognition of destination address in postal documents of India not only managed to address the difficulties related to identification of the scripts but also managed to get rid of those problems which is generated due to character touching in postal documents. For city-name recognition, lexicon information is used. However, no lexicon information is used for pin code recognition since an Indian pin code contains only 6 digits. We obtained 99.55% reliability from tri-lingual city-name recognition system where error rates are 0.20% and rejection rates a...

Research paper thumbnail of Mining text from natural scene and video images: A survey

WIREs Data Mining and Knowledge Discovery, 2021

In computer terminology, mining is considered as extracting meaningful information or knowledge f... more In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.

Research paper thumbnail of Evaluation of Gist Operator for Document Image Retrieval

2018 13th IAPR International Workshop on Document Analysis Systems (DAS), 2018

Research paper thumbnail of An Online Learning-Based Adaptive Biometric System

Adaptive Biometric Systems, 2015

In the last decade, adaptive biometrics has become an emerging field of research. Considering the... more In the last decade, adaptive biometrics has become an emerging field of research. Considering the fact that limited work has been undertaken on adaptive biometrics using machine learning techniques, in this chapter we list and discuss a few out of many potential learning techniques that can be applied to build an adaptive biometric system. In order to illustrate the efficacy of one of the incremental learning techniques from the literature, we built an adaptive biometric system. For experimentation, we have used multi-modal ocular (sclera and iris) data. The preliminary results have been reported in the results section, which are very promising.

Research paper thumbnail of Fast local binary pattern: Application to document image retrieval

2017 International Conference on Image and Vision Computing New Zealand (IVCNZ), 2017

Research paper thumbnail of A New Method for Detecting Altered Text in Document Images

Pattern Recognition and Artificial Intelligence, 2020

As more and more office documents are captured, stored, and shared in digital format, and as imag... more As more and more office documents are captured, stored, and shared in digital format, and as image editing software are becoming increasingly more powerful, there is a growing concern about document authenticity. To prevent illicit activities, this paper presents a new method for detecting altered text in document images. The proposed method explores the relationship between positive and negative coefficients of DCT to extract the effect of distortions caused by tampering by fusing reconstructed images of respective positive and negative coefficients, which results in Positive-Negative DCT coefficients Fusion (PNDF). To take advantage of spatial information, we propose to fuse R, G, and B color channels of input images, which results in RGBF (RGB Fusion). Next, the same fusion operation is used for fusing PNDF and RGBF, which results in a fused image for the original input one. We compute a histogram to extract features from the fused image, which results in a feature vector. The feature vector is then fed to a deep neural network for classifying altered text images. The proposed method is tested on our own dataset and the standard datasets from the ICPR 2018 Fraud Contest, Altered Handwriting (AH), and faked IMEI number images. The results show that the proposed method is effective and the proposed method outperforms the existing methods irrespective of image type.

Research paper thumbnail of Zone-based keyword spotting in Bangla and Devanagari documents

Multimedia Tools and Applications, 2020

Research paper thumbnail of A comparative study of different texture features for document image retrieval

Expert Systems with Applications, 2018

Research paper thumbnail of An Efficient Signature Verification Method Based on an Interval Symbolic Representation and a Fuzzy Similarity Measure

IEEE Transactions on Information Forensics and Security, 2017

Research paper thumbnail of Document Image Retrieval Based on Texture Features: A Recognition-Free Approach

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2016

Research paper thumbnail of A brief review of document image retrieval methods: Recent advances

2016 International Joint Conference on Neural Networks (IJCNN), 2016