Peeta Basa Pati - Academia.edu (original) (raw)

Uploads

Papers by Peeta Basa Pati

Research paper thumbnail of Script Identification in Printed Bilingual Documents

5th International Workshop, DAS 2002, Proceedings, 2002

Identification of script in multi-lingual documents is essential for many language dependent appl... more Identification of script in multi-lingual documents is essential for many language dependent applications such as machine translation and optical character recognition. Techniques for script identification generally require large areas for operation so that sufficient information is available. Such assumption is nullified in Indian context, as there is an interspersion of words of two different scripts in most documents. In this paper, techniques to identify the script of a word are discussed. Two different approaches have been proposed and tested. The first method structures words into 3 distinct spatial zones and utilizes the information on the spatial spread of a word in upper and lower zones, together with the character density, in order to identify the script. The second technique analyses the directional energy distribution of a word using Gabor filters with suitable frequencies and orientations. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results obtained are quite encouraging.

Research paper thumbnail of Industry-Academia Collaboration via Internships

2009 22nd Conference on Software Engineering Education and Training, 2009

IT industry in India has witnessed high growth in the last few years. This rapid growth has creat... more IT industry in India has witnessed high growth in the last few years. This rapid growth has created human resource demand supply mismatch. IT industry is continuously on the lookout for fresh and young talents. While the campus recruitment of fresh graduates can provide the required numbers, it has been widely recognized that such recruits lack the skills which are essential for a successful career in the corporate world. Internship is proving to be a valuable approach of identifying talent early on, enriching their technical skills, nurturing them with the requisite domain knowledge, and subsequently hire them into the organization.

Research paper thumbnail of Text Localization and Extraction from Complex Gray Images

Lecture Notes in Computer Science, 2006

We propose two texture-based approaches, one involving Gabor filters and the other employing log-... more We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris' corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.

Research paper thumbnail of HVS Inspired System for Script Identification in Indian Multi-script Documents

Lecture Notes in Computer Science, 2006

... Thus our system, at all its configurations, outperforms the system 3 Since we didn't... more ... Thus our system, at all its configurations, outperforms the system 3 Since we didn't have access to the databases used by Pal/Chaudhuri and Padma/Nagabhushan, the comparison is just numeric. Page 8. HVS Inspired System for Script Identification 387 ...

Research paper thumbnail of Automatic text block separation in document images

2006 Fourth International Conference on Intelligent Sensing and Information Processing, 2006

... 80, pp. 88-110, 2000. [7] U. Pal and BB Chaudhuri, "Automatic separation of mach... more ... 80, pp. 88-110, 2000. [7] U. Pal and BB Chaudhuri, "Automatic separation of machine-printed and hand-written text lines.," in Proceedings ofthe International Conference on Document Analysis and Recognition, 1999, pp. 645-648. 80 100 [8] S Sabari Raju, PB Pati, and AG ...

Research paper thumbnail of Can Biological Motion be a Biometric?

2006 Fourth International Conference on Intelligent Sensing and Information Processing, 2006

Biological motion has successfully been used for analysis of a person's mood and other psychologi... more Biological motion has successfully been used for analysis of a person's mood and other psychological traits. Efforts are made to use human gait as a non-invasive mode of biometric. In this reported work, we try to study the effectiveness of biological gait motion of people as a cue to biometric based person recognition. The data is 3D in nature and, hence, has more information with itself than the cues obtained from videobased gait patterns. The high accuracies of person recognition, using a simple linear model of data representation and simple neighborhood based classfiers, suggest that it is the nature of the data which is more important than the recognition scheme employed.

Research paper thumbnail of Script identification in printed bilingual documents

Research paper thumbnail of A blind indic script recognizer for multi-script documents

We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grou... more We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grouping of the 11 scripts is accomplished at the first level of this hierarchy. At the subsequent level, we recognize the script in each group. The various nodes of this tree use different feature-classifier combinations. A database of 20,000 words of different font styles and sizes

Research paper thumbnail of Word level multi-script identification

Pattern Recognition Letters, Jul 1, 2008

We report an algorithm to identify the script of each word in a document image. We start with a b... more We report an algorithm to identify the script of each word in a document image. We start with a bi-script scenario which is later extended to tri-script and then to eleven-script scenarios. A database of 20,000 words of different font styles and sizes has been collected and used for each script. Effectiveness of Gabor and discrete cosine transform (DCT) features has been independently evaluated using nearest neighbor, linear discriminant and support vector machines (SVM) classifiers. The combination of Gabor features with nearest neighbor or SVM classifier shows promising results; i.e., over 98% for bi-script and tri-script cases and above 89% for the eleven-script scenario.
NB: The data used for this study is also uploaded separately onto Academia.edu

Research paper thumbnail of OCR in Indian scripts: A survey

India is a multilingual country. A significantly large number of scripts are used to represent th... more India is a multilingual country. A significantly large number of scripts are used to represent these languages. A desire of vision researchers is to develop an integrated optical character recognition (OCR) system, which will be able to process all such scripts. Such a development, if objectified, will not only enable faster flow of information across the country, but also have a profound effect on its scientific and economical development. Courageous endeavours have been successfully made towards the development of systems capable of recognizing machine-printed or handwritten characters and/or numerals. However, most Indian scripts do not have an integrated OCR system. Further, the development of a unified system, which is capable of processing all Indian scripts is still a dream. This article presents a survey of the current literature on the development of OCR's in Indian scripts. Reviewing the basis of and the motivation towards the development of OCR system, the article analyzes the various methodologies employed in general purpose pattern recognition systems. A critical analysis of the work towards OCR systems in Indian languages, with pointers towards possible future work, is also presented.

Research paper thumbnail of Script Identification in Printed Bilingual Documents

5th International Workshop, DAS 2002, Proceedings, 2002

Identification of script in multi-lingual documents is essential for many language dependent appl... more Identification of script in multi-lingual documents is essential for many language dependent applications such as machine translation and optical character recognition. Techniques for script identification generally require large areas for operation so that sufficient information is available. Such assumption is nullified in Indian context, as there is an interspersion of words of two different scripts in most documents. In this paper, techniques to identify the script of a word are discussed. Two different approaches have been proposed and tested. The first method structures words into 3 distinct spatial zones and utilizes the information on the spatial spread of a word in upper and lower zones, together with the character density, in order to identify the script. The second technique analyses the directional energy distribution of a word using Gabor filters with suitable frequencies and orientations. Words with various font styles and sizes have been used for the testing of the proposed algorithms and the results obtained are quite encouraging.

Research paper thumbnail of Industry-Academia Collaboration via Internships

2009 22nd Conference on Software Engineering Education and Training, 2009

IT industry in India has witnessed high growth in the last few years. This rapid growth has creat... more IT industry in India has witnessed high growth in the last few years. This rapid growth has created human resource demand supply mismatch. IT industry is continuously on the lookout for fresh and young talents. While the campus recruitment of fresh graduates can provide the required numbers, it has been widely recognized that such recruits lack the skills which are essential for a successful career in the corporate world. Internship is proving to be a valuable approach of identifying talent early on, enriching their technical skills, nurturing them with the requisite domain knowledge, and subsequently hire them into the organization.

Research paper thumbnail of Text Localization and Extraction from Complex Gray Images

Lecture Notes in Computer Science, 2006

We propose two texture-based approaches, one involving Gabor filters and the other employing log-... more We propose two texture-based approaches, one involving Gabor filters and the other employing log-polar wavelets, for separating text from non-text elements in a document image. Both the proposed algorithms compute local energy at some information-rich points, which are marked by Harris' corner detector. The advantage of this approach is that the algorithm calculates the local energy at selected points and not throughout the image, thus saving a lot of computational time. The algorithm has been tested on a large set of scanned text pages and the results have been seen to be better than the results from the existing algorithms. Among the proposed schemes, the Gabor filter based scheme marginally outperforms the wavelet based scheme.

Research paper thumbnail of HVS Inspired System for Script Identification in Indian Multi-script Documents

Lecture Notes in Computer Science, 2006

... Thus our system, at all its configurations, outperforms the system 3 Since we didn't... more ... Thus our system, at all its configurations, outperforms the system 3 Since we didn't have access to the databases used by Pal/Chaudhuri and Padma/Nagabhushan, the comparison is just numeric. Page 8. HVS Inspired System for Script Identification 387 ...

Research paper thumbnail of Automatic text block separation in document images

2006 Fourth International Conference on Intelligent Sensing and Information Processing, 2006

... 80, pp. 88-110, 2000. [7] U. Pal and BB Chaudhuri, "Automatic separation of mach... more ... 80, pp. 88-110, 2000. [7] U. Pal and BB Chaudhuri, "Automatic separation of machine-printed and hand-written text lines.," in Proceedings ofthe International Conference on Document Analysis and Recognition, 1999, pp. 645-648. 80 100 [8] S Sabari Raju, PB Pati, and AG ...

Research paper thumbnail of Can Biological Motion be a Biometric?

2006 Fourth International Conference on Intelligent Sensing and Information Processing, 2006

Biological motion has successfully been used for analysis of a person's mood and other psychologi... more Biological motion has successfully been used for analysis of a person's mood and other psychological traits. Efforts are made to use human gait as a non-invasive mode of biometric. In this reported work, we try to study the effectiveness of biological gait motion of people as a cue to biometric based person recognition. The data is 3D in nature and, hence, has more information with itself than the cues obtained from videobased gait patterns. The high accuracies of person recognition, using a simple linear model of data representation and simple neighborhood based classfiers, suggest that it is the nature of the data which is more important than the recognition scheme employed.

Research paper thumbnail of Script identification in printed bilingual documents

Research paper thumbnail of A blind indic script recognizer for multi-script documents

We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grou... more We report a hierarchical blind script identifier for 11 different Indian scripts. An initial grouping of the 11 scripts is accomplished at the first level of this hierarchy. At the subsequent level, we recognize the script in each group. The various nodes of this tree use different feature-classifier combinations. A database of 20,000 words of different font styles and sizes

Research paper thumbnail of Word level multi-script identification

Pattern Recognition Letters, Jul 1, 2008

We report an algorithm to identify the script of each word in a document image. We start with a b... more We report an algorithm to identify the script of each word in a document image. We start with a bi-script scenario which is later extended to tri-script and then to eleven-script scenarios. A database of 20,000 words of different font styles and sizes has been collected and used for each script. Effectiveness of Gabor and discrete cosine transform (DCT) features has been independently evaluated using nearest neighbor, linear discriminant and support vector machines (SVM) classifiers. The combination of Gabor features with nearest neighbor or SVM classifier shows promising results; i.e., over 98% for bi-script and tri-script cases and above 89% for the eleven-script scenario.
NB: The data used for this study is also uploaded separately onto Academia.edu

Research paper thumbnail of OCR in Indian scripts: A survey

India is a multilingual country. A significantly large number of scripts are used to represent th... more India is a multilingual country. A significantly large number of scripts are used to represent these languages. A desire of vision researchers is to develop an integrated optical character recognition (OCR) system, which will be able to process all such scripts. Such a development, if objectified, will not only enable faster flow of information across the country, but also have a profound effect on its scientific and economical development. Courageous endeavours have been successfully made towards the development of systems capable of recognizing machine-printed or handwritten characters and/or numerals. However, most Indian scripts do not have an integrated OCR system. Further, the development of a unified system, which is capable of processing all Indian scripts is still a dream. This article presents a survey of the current literature on the development of OCR's in Indian scripts. Reviewing the basis of and the motivation towards the development of OCR system, the article analyzes the various methodologies employed in general purpose pattern recognition systems. A critical analysis of the work towards OCR systems in Indian languages, with pointers towards possible future work, is also presented.