Script Identification from Trilingual Documents Using Profile Based Features (original) (raw)
Related papers
A Novel Framework for Multilingual Script Detection and Pattern Analysis in Mixed Script Queries
International Journal of Experimental Research and Review, 2024
A script detection system that is capable of handling several languages is becoming more necessary in today's world. The task of identifying scripts written in various languages has been substantially facilitated by the use of machine learning and deep learning, respectively. Machine learning techniques have used the Naive Bayes and Support Vector Machines (SVM) mechanism for the purpose of language detection. On the other hand, this paper reviews several unique deep-learning processes that have considered a range of methodologies, including LSTM and Bert. On the other hand, it has been shown that there is a need to improve the accuracy and the scalability often incorporated in multilingual systems. As a consequence of this, the primary focus of the present investigation is on the development of an innovative framework that is capable of recognizing scripts in a variety of languages. In addition, this technique considers pattern analysis while considering mixed script queries. A scalable, efficient, and adaptive approach has been established via study to increase the accuracy of the identification of a large number of languages. Accuracy, recall, and F1-score are some of the performance metrics that have been calculated in order to evaluate the efficacy of the multilingual script identification that has been presented. In conclusion, it has been found that the approach that was provided has supplied a solution that is both efficient and scalable for the detection of multilingual scripts.
Recent Advances in Script Identification
There are a variety of different scripts in the world. Almost every country have there own languages and scripts which can distinguish from each other in different aspects. It is very essential to identify different scripts in multi-lingual, multi-script document. In recent years, different kinds of approaches have been developed for script identification and gotten promising results. In this paper, an overview of the script identification is proposed under different categories: script systems, extracted features and classification methods. Earlier researches and future property of this field is discussed. It is very obvious that, the research in this area is not so satisfied and still more research is to be done.
Automatic Script and Type Identification in Bi-lingual Forms
International Journal of Computing and Information Sciences, 2016
In this paper we have developed a system that can automatically discriminate between machine-printed and handwritten words in structured bilingual (Arabic and French) form document layout. Our system has been applied in the context of Tunisian National Health Insurance Fund for medical care costs refund with encouraging results. In the used forms, handwritten data usually touch or cross the preprinted form frames and texts, creating complex problems for the recognition routines. Each text type should also be processed using different methods in order to optimize the recognition accuracy. This work aims to address these issues and to especially solve the problem of machine-printed/handwritten and Arabic/French word discrimination. To this end, we computed co-occurrence matrix of oriented gradients from word's image and used it as input to a k-Nearest Neighbor classifier. Experiments are carried on 20 forms. An average script identification rate of 98.31% is achieved.
Script Identification from Camera Based Tri-Lingual Document
IEEE Explore, 2017
In this paper, an algorithm is proposed for Trilingual Script Identification System in block wise for camera captured images. The Local Binary Pattern (LBP) features are used for Kannada, Hindi and English images for testing the performance of a proposed algorithm, a dataset of 6000 neat block images are considered. For each script a total of 2000 images are used for the proposed method. The segmentation technique is used to segment the document image in blocks. Block of sizes 128x128, 256x256, 512x512 and 1024x1024 for Kannada, Hindi and English have been considered. The LBP features are extracted in 8 neighbors, there by generating 59 features and submitted to KNN and SVM classifiers to classify the underlying image. The identification accuracy for KNN and SVM classifiers are respectively 96.60% and 98.00% for block size 128x128, 98.71% and 98.07% for block size 256x256, 99.70% and 98.00% for block size 512x512 and further 94.90% and 99.01% for block size 1024x1024 respectively. The optimal accuracy is 99.01% for SVM classifier for block size 1024x1024. The proposed method is independent of thinning.