Tirthankar Dasgupta | IIT Kharagpur (original) (raw)

Papers by Tirthankar Dasgupta

Research paper thumbnail of Does Word2Vec encode human perception of similarityƒ A study in Bangla

2019 International Conference on Bangla Speech and Language Processing (ICBSLP), 2019

The quest to understand how language and concepts are organized in human mind is a neverending pu... more The quest to understand how language and concepts are organized in human mind is a neverending pursuit undertaken by researchers in computational psycholinguistics; simultaneously, on the other hand, researchers have tried to quantitatively model the semantic space from written corpora and discourses through different computational approaches - while both of these interacts with each other in-terms of understanding human processing through computational linguistics and enhancing NLP methods from the insights, it has seldom been systematically studied if the two corroborates each other. In this paper, we have explored how and if the standard word embedding based semantic representation models represent the human mental lexicon. Towards that, We have conducted a semantic priming experiment to capture the psycholinguistics aspects and compared the results with a distributional word-embedding model: Bangla word2Vec. Analysis of reaction time indicates that corpus-based semantic similarity measures do not reflect the true nature of mental representation and processing of words. To the best of our knowledge this is first of a kind study in any language especially Bangla.

Research paper thumbnail of Ontology Guided Purposive News Retrieval and Presentation

In this paper, we present a purposive News information retrieval and presentation system that cur... more In this paper, we present a purposive News information retrieval and presentation system that curates information from News articles collected from multiple trusted sources for a given domain. A back-end domain ontology provides details about the concepts and relations of interest. We propose an attention based CNN-BiLSTM model to classify sentence tokens as ontology concepts or entities of interest. These entities are then curated and used to link articles to illustrate evolution of events over time and regions. Working systems are initiated with small annotated data sets which are later augmented with humans in the loop. It is easily customizable for various domains.

Research paper thumbnail of Learning Domain Terms - Empirical Methods to Enhance Enterprise Text Analytics Performance

Performance of standard text analytics algorithms are known to be substantially degraded on consu... more Performance of standard text analytics algorithms are known to be substantially degraded on consumer generated data, which are often very noisy. These algorithms also do not work well on enterprise data which has a very different nature from News repositories, storybooks or Wikipedia data. Text cleaning is a mandatory step which aims at noise removal and correction to improve performance. However, enterprise data need special cleaning methods since it contains many domain terms which appear to be noise against a standard dictionary, but in reality are not so. In this work we present detailed analysis of characteristics of enterprise data and suggest unsupervised methods for cleaning these repositories after domain terms have been automatically segregated from true noise terms. Noise terms are thereafter corrected in a contextual fashion. The effectiveness of the method is established through careful manual evaluation of error corrections over several standard data sets, including th...

Research paper thumbnail of Resource creation and development of an English-Bangla back transliteration system

International Journal of Knowledge-based and Intelligent Engineering Systems, 2015

Research paper thumbnail of Computational Models of the Representation of Bangla Compound Words in the Mental Lexicon

Journal of Psycholinguistic Research, 2015

In this paper we aim to model the organization and processing of Bangla compound words in the men... more In this paper we aim to model the organization and processing of Bangla compound words in the mental lexicon. Our objective is to determine whether the mental lexicon access a Bangla compound word as a whole or decomposes the whole word into its constituent morphemes and then recognize them accordingly. To address this issue, we adopted two different strategies. First, we conduct a cross-modal priming experiment over a number of native speakers. Analysis of reaction time (RT) and error rates indicates that in general, Bangla compound words are accessed via partial decomposition process. That is some word follows full-listing mode of representation and some words follow the decomposition route of representation. Next, based on the collected RT data we have developed a computational model that can explain the processing phenomena of the access and representation of Bangla compound words. In order to achieve this, we first explored the individual roles of head word position, morphological complexity, orthographic transparency and semantic compositionality between the constituents and the whole compound word. Accordingly, we have developed a complexity based model by combining these features together. To a large extent we have successfully explained the possible processing phenomena of most of the Bangla compound words. Our proposed model shows an accuracy of around 83 %.

Research paper thumbnail of Forward Transliteration of Dzongkha Text to Braille

In this paper we present an automatic Dzongkha text to Braille forward transliteration system. Dz... more In this paper we present an automatic Dzongkha text to Braille forward transliteration system. Dzongkha is the national language of Bhutan. The system is aimed at providing low cost efficient access mechanisms for blind people. It also addresses the problem of scarcity of having automatic Braille transliteration systems in language slime Dzongkha. The present system can be configured to take Dzongkha text document as input and based on some transliteration rules it generates the corresponding Braille output. We further extended the system to support an Audio QWERTY editor which allows a blind person to read and write Dzongkha texts or its equivalent Braille through a computer. The editor also contains Dzongkha voice feedbacks to further ease the use.

Research paper thumbnail of Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

In this work we present psycholinguistically motivated computational models for the organization ... more In this work we present psycholinguistically motivated computational models for the organization and processing of Bangla morphologically complex words in the mental lexicon. Our goal is to identify whether morphologically complex words are stored as a whole or are they organized along the morphological line. For this, we have conducted a series of psycholinguistic experiments to build up hypothesis on the possible organizational structure of the mental lexicon. Next, we develop computational models based on the collected dataset. We observed that derivationally suffixed Bangla words are in general decomposed during processing and compositionality between the stem and the suffix plays an important role in the decomposition process. We observed the same phenomena for Bangla verb sequences where experiments showed noncompositional verb sequences are in general stored as a whole in the ML and low traces of compositional verbs are found in the mental lexicon.

Research paper thumbnail of Design and Development of an Online Computational Framework to Facilitate Language Comprehension Research on Indian Languages

In this paper we have developed an open-source online computational framework that can be used by... more In this paper we have developed an open-source online computational framework that can be used by different research groups to conduct reading researches on Indian language texts. The framework can be used to develop a large annotated Indian language text comprehension data from different user based experiments. The novelty in this framework lies in the fact that it brings different empirical data-collection techniques for text comprehension under one roof. The framework has been customized specifically to address language particularities for Indian languages. It will also offer many types of automatic analysis on the data at different levels such as full text, sentence and word level. To address the subjectivity of text difficulty perception, the framework allows to capture user background against multiple factors. The assimilated data can be automatically cross referenced against varying strata of readers.

Research paper thumbnail of Influence of Target Reader Background and Text Features on Text Readability in Bangla: A Computational Approach

In this paper, we have studied the effect of two important factors influencing text readability i... more In this paper, we have studied the effect of two important factors influencing text readability in Bangla: the target reader and text properties. Accordingly, at first we have built a novel Bangla readability dataset of 135 documents annotated by 50 readers from two different backgrounds. We have identified 20 different features that can affect the readability of Bangla texts; the features were divided in two groups, namely, "classic" and "non-classic". Preliminary correlation analysis reveals that text features have varying influence on the text hardness stated by the two groups. We have employed support vector machine (SVM) and support vector regression (SVR) techniques to model the reading difficulties of Bangla texts. In addition to developing different models targeted towards different type of readers, separate combinations of features were tested to evaluate their comparative contributions. Our study establishes that the perception of text difficulty varies largely with the background of the reader. To the best of our knowledge, no such work on text readability has been recorded earlier in Bangla.

Research paper thumbnail of Modelling the Organization and Processing of Bangla Polymorphemic Words in the Mental Lexicon: A Computational Approach

In this paper we try to present psycholinguistically motivated computational model for the access... more In this paper we try to present psycholinguistically motivated computational model for the access and representation of Bangla polymorphemic words in the Mental Lexicon. We first conduct a series of masked priming experiment on a set of Bangla polymorphemic words. Our analysis indicates a significant number of words shows morphological decomposition during the processing stage. We further developed a computational model for the processing of Bangla polymorphemic words. The novelty of the new model over the existing ones are, the proposed model not only considers the frequency of the derived word but also considers the role of its constituent stem, suffix and the degree of affixation between the stem and the suffix. We have evaluated the new model with the results obtained from the priming experiment and then compare it with the state of the art. The proposed model has been found to perform better than the existing models.

Research paper thumbnail of Development of an Online Repository of Bangla Literary Texts and its Ontological Representation for Advance Search Options

Research paper thumbnail of A Complex Network Analysis of Syllables in Bangla through SyllableNet

Research paper thumbnail of Automatic Extraction of Compound Verbs from Bangla Corpora

In this paper we present a rule-based technique for the automatic extraction of Bangla compound v... more In this paper we present a rule-based technique for the automatic extraction of Bangla compound verbs from raw text corpora. In our work we have (a) proposed rules through which a system could automatically identify Bangla CVs from texts. These rules will be established on the basis of syntactic interpretation of sentences, (b) we shall explain problems of CV identification subject to the semantics and pragmatics of Bangla language, (c) finally, we have applied these rules on two different Bangla corpuses to extract CVs. The extracted CVs were manually evaluated by linguistic experts where our system and achieved an accuracy of around 70%.

Research paper thumbnail of New Readability Measures for Bangla and Hindi Texts

In this paper we present computational models to compute readability of Indian language text docu... more In this paper we present computational models to compute readability of Indian language text documents. We first demonstrate the inadequacy and the consequent inapplicability of some of the popular readability metrics in English to Hindi and Bangla. Next, we present user experiments to identify important structural parameters of Bangla and Hindi that affect readability of texts in these two languages. Accordingly, we propose two different readability models for each Bangla and Hindi. The models are tested against a second round of user studies with completely new set of data. The results validate the propose models. Compared to the handful of existing works in Hindi and Bangla text readability, this paper presents the first ever definitive readability models for these languages incorporating their salient structural features.

Research paper thumbnail of Web browsing interface for people with severe speech and motor impairment in India

Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility - ASSETS '14, 2014

We present design and development of a web browser that allow easy dissemination of information t... more We present design and development of a web browser that allow easy dissemination of information through World Wide Web for people with cerebral palsy in India. Our focus user group comprises people with severe form of spastic cerebral palsy and highly restricted motor movement skills. Throughout the development process we have interacted with the target users to understand their requirements and to get design advises. The browser is augmented with an intelligent auto-scanning mechanism through which the web contents and browser GUI controls can be accessed with less time and effort. We have field tested the browser with the target users where preliminary evaluation results suggests that the proposed browser is quite effective in terms of task execution time, cognitive effort and overall usability.

Research paper thumbnail of Development of accessible toolset to enhance social interaction opportunities for people with cerebral palsy in India

Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility - ASSETS '14, 2014

In this paper we have developed a toolset that will allow people with severe spastic cerebral pal... more In this paper we have developed a toolset that will allow people with severe spastic cerebral palsy (CP) and highly restricted motor movement skills to access popular social-networking and communication mediums like, Facebook and E-mails. To understand the requirements of the intended users we have performed a number of surveys that acted as basis of our system design. The developed tools use special access switch based scanning technique for easy navigation in different applications. We have evaluated the toolset with six target users. The preliminary results demonstrate a positive response.

Research paper thumbnail of How Word Order Affects Sentence Comprehension in Bangla: A Computational Approach to Simple Sentence

Lecture Notes in Computer Science, 2013

Sentence comprehension is an integral and important part of whole text comprehension. It involves... more Sentence comprehension is an integral and important part of whole text comprehension. It involves complex cognitive actions, as a reader has to work through lexical, syntactic and semantic aspects in order to understand a sentence. One of the vital features of a sentence is word order or surface forms. Different languages have evolved different systems of word orders, which reflect the cognitive structure of the native users of that language. Therefore, word order affects the cognitive load exerted by a sentence as experienced by the reader. Computational modeling approach to quantify the effect of word order on difficulty of sentence understanding can provide a great advantage in study of text readability and its applications. Handful of works has done in English and other languages to address the issue. Bangla, which is the fifth mostly spoken languages in the world and a relatively free word order language, still does not have any computational model to quantify the reading difficulty of a sentence. In this paper, we have developed models to predict the comprehending difficulty of a simple sentence according to its different surface forms in Bangla. In the course of action, we have also established that difficulty measures for English do not hold in Bangla. Our model has been validated against an extensive user survey.

Research paper thumbnail of A Joint Source Channel Model for the English to Bengali Back Transliteration

Lecture Notes in Computer Science, 2013

In this paper we present an English-to-Bengali back transliteration system that can be used to tr... more In this paper we present an English-to-Bengali back transliteration system that can be used to transliterate Bengali texts written in Romanized English, back to its original script. Our proposed system uses a bilingual parallel corpus of English-Bengali transliterated word pairs and applies both the orthographic as well as phonetic information to two different computational models namely, the joint source channel model and the trigram model, to automatically identify, extract and learning of transliteration unit (TU) pairs from both the source and target language words. Finally, the system predicts the top 10 best possible outcome of the given input text. We further extend our work to make the target word prediction module more robust. This is done by the phonological analysis of the generated target sentence. Both the models have been evaluated with a set of 2000 Romanized Bengali test words. Our initial evaluation results clearly shows that the joint source channel model performs much better than the trigram model.

Research paper thumbnail of Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure

International Journal of Computer Applications, 2014

In this paper, we have proposed a hierarchically organized semantic lexicon in Bangla and also a ... more In this paper, we have proposed a hierarchically organized semantic lexicon in Bangla and also a graph based edgeweighting approach to measure semantic similarity between two Bangla words. We have also developed a graphical user interface to represent the lexical organization. Our proposed lexical structure contains only relations based on semantic association. We have included the frequency of each word over five Bangla corpuses in our lexical structure and also associated more details to words such as, whether the words are mythological or not, whether it can be used as verb or not, in order to use the word as a verb which word should be appended to it etc. As we have earlier discussed, this lexicon can be used in various applications like categorization, semantic web, and natural language processing applications like, document clustering, word sense disambiguation, machine translation, information retrieval, text comprehension and question-answering systems.

Research paper thumbnail of Computational Modeling of Morphological Effects in Bangla Visual Word Recognition

Journal of Psycholinguistic Research, 2014

In this paper we aim to model the organization and processing of Bangla polymorphemic words in th... more In this paper we aim to model the organization and processing of Bangla polymorphemic words in the mental lexicon. Our objective is to determine whether the mental lexicon accesses a polymorphemic word as a whole or decomposes the word into its constituent morphemes and then recognize them accordingly. To address this issue, we adopted two different strategies. First, we conduct a masked priming experiment over native speakers. Analysis of reaction time (RT) and error rates indicates that in general, morphologically derived words are accessed via decomposition process. Next, based on the collected RT data we have developed a computational model that can explain the processing phenomena of the access and representation of Bangla derivationally suffixed words. In order to do so, we first explored the individual roles of different linguistic features of a Bangla morphologically complex word and observed that processing of Bangla morphologically complex words depends upon several factors like, the base and surface word frequency, suffix type/token ratio, suffix family size and suffix productivity. Accordingly, we have proposed different feature models. Finally, we combine these feature models together and came up with a new model that takes the advantage of the individual feature models and successfully explain the processing phenomena of most of the Bangla morphologically derived words. Our proposed model shows an accuracy of around 80 % which outperforms the other related frequency models.

Research paper thumbnail of Does Word2Vec encode human perception of similarityƒ A study in Bangla

2019 International Conference on Bangla Speech and Language Processing (ICBSLP), 2019

The quest to understand how language and concepts are organized in human mind is a neverending pu... more The quest to understand how language and concepts are organized in human mind is a neverending pursuit undertaken by researchers in computational psycholinguistics; simultaneously, on the other hand, researchers have tried to quantitatively model the semantic space from written corpora and discourses through different computational approaches - while both of these interacts with each other in-terms of understanding human processing through computational linguistics and enhancing NLP methods from the insights, it has seldom been systematically studied if the two corroborates each other. In this paper, we have explored how and if the standard word embedding based semantic representation models represent the human mental lexicon. Towards that, We have conducted a semantic priming experiment to capture the psycholinguistics aspects and compared the results with a distributional word-embedding model: Bangla word2Vec. Analysis of reaction time indicates that corpus-based semantic similarity measures do not reflect the true nature of mental representation and processing of words. To the best of our knowledge this is first of a kind study in any language especially Bangla.

Research paper thumbnail of Ontology Guided Purposive News Retrieval and Presentation

In this paper, we present a purposive News information retrieval and presentation system that cur... more In this paper, we present a purposive News information retrieval and presentation system that curates information from News articles collected from multiple trusted sources for a given domain. A back-end domain ontology provides details about the concepts and relations of interest. We propose an attention based CNN-BiLSTM model to classify sentence tokens as ontology concepts or entities of interest. These entities are then curated and used to link articles to illustrate evolution of events over time and regions. Working systems are initiated with small annotated data sets which are later augmented with humans in the loop. It is easily customizable for various domains.

Research paper thumbnail of Learning Domain Terms - Empirical Methods to Enhance Enterprise Text Analytics Performance

Performance of standard text analytics algorithms are known to be substantially degraded on consu... more Performance of standard text analytics algorithms are known to be substantially degraded on consumer generated data, which are often very noisy. These algorithms also do not work well on enterprise data which has a very different nature from News repositories, storybooks or Wikipedia data. Text cleaning is a mandatory step which aims at noise removal and correction to improve performance. However, enterprise data need special cleaning methods since it contains many domain terms which appear to be noise against a standard dictionary, but in reality are not so. In this work we present detailed analysis of characteristics of enterprise data and suggest unsupervised methods for cleaning these repositories after domain terms have been automatically segregated from true noise terms. Noise terms are thereafter corrected in a contextual fashion. The effectiveness of the method is established through careful manual evaluation of error corrections over several standard data sets, including th...

Research paper thumbnail of Resource creation and development of an English-Bangla back transliteration system

International Journal of Knowledge-based and Intelligent Engineering Systems, 2015

Research paper thumbnail of Computational Models of the Representation of Bangla Compound Words in the Mental Lexicon

Journal of Psycholinguistic Research, 2015

In this paper we aim to model the organization and processing of Bangla compound words in the men... more In this paper we aim to model the organization and processing of Bangla compound words in the mental lexicon. Our objective is to determine whether the mental lexicon access a Bangla compound word as a whole or decomposes the whole word into its constituent morphemes and then recognize them accordingly. To address this issue, we adopted two different strategies. First, we conduct a cross-modal priming experiment over a number of native speakers. Analysis of reaction time (RT) and error rates indicates that in general, Bangla compound words are accessed via partial decomposition process. That is some word follows full-listing mode of representation and some words follow the decomposition route of representation. Next, based on the collected RT data we have developed a computational model that can explain the processing phenomena of the access and representation of Bangla compound words. In order to achieve this, we first explored the individual roles of head word position, morphological complexity, orthographic transparency and semantic compositionality between the constituents and the whole compound word. Accordingly, we have developed a complexity based model by combining these features together. To a large extent we have successfully explained the possible processing phenomena of most of the Bangla compound words. Our proposed model shows an accuracy of around 83 %.

Research paper thumbnail of Forward Transliteration of Dzongkha Text to Braille

In this paper we present an automatic Dzongkha text to Braille forward transliteration system. Dz... more In this paper we present an automatic Dzongkha text to Braille forward transliteration system. Dzongkha is the national language of Bhutan. The system is aimed at providing low cost efficient access mechanisms for blind people. It also addresses the problem of scarcity of having automatic Braille transliteration systems in language slime Dzongkha. The present system can be configured to take Dzongkha text document as input and based on some transliteration rules it generates the corresponding Braille output. We further extended the system to support an Audio QWERTY editor which allows a blind person to read and write Dzongkha texts or its equivalent Braille through a computer. The editor also contains Dzongkha voice feedbacks to further ease the use.

Research paper thumbnail of Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

In this work we present psycholinguistically motivated computational models for the organization ... more In this work we present psycholinguistically motivated computational models for the organization and processing of Bangla morphologically complex words in the mental lexicon. Our goal is to identify whether morphologically complex words are stored as a whole or are they organized along the morphological line. For this, we have conducted a series of psycholinguistic experiments to build up hypothesis on the possible organizational structure of the mental lexicon. Next, we develop computational models based on the collected dataset. We observed that derivationally suffixed Bangla words are in general decomposed during processing and compositionality between the stem and the suffix plays an important role in the decomposition process. We observed the same phenomena for Bangla verb sequences where experiments showed noncompositional verb sequences are in general stored as a whole in the ML and low traces of compositional verbs are found in the mental lexicon.

Research paper thumbnail of Design and Development of an Online Computational Framework to Facilitate Language Comprehension Research on Indian Languages

In this paper we have developed an open-source online computational framework that can be used by... more In this paper we have developed an open-source online computational framework that can be used by different research groups to conduct reading researches on Indian language texts. The framework can be used to develop a large annotated Indian language text comprehension data from different user based experiments. The novelty in this framework lies in the fact that it brings different empirical data-collection techniques for text comprehension under one roof. The framework has been customized specifically to address language particularities for Indian languages. It will also offer many types of automatic analysis on the data at different levels such as full text, sentence and word level. To address the subjectivity of text difficulty perception, the framework allows to capture user background against multiple factors. The assimilated data can be automatically cross referenced against varying strata of readers.

Research paper thumbnail of Influence of Target Reader Background and Text Features on Text Readability in Bangla: A Computational Approach

In this paper, we have studied the effect of two important factors influencing text readability i... more In this paper, we have studied the effect of two important factors influencing text readability in Bangla: the target reader and text properties. Accordingly, at first we have built a novel Bangla readability dataset of 135 documents annotated by 50 readers from two different backgrounds. We have identified 20 different features that can affect the readability of Bangla texts; the features were divided in two groups, namely, "classic" and "non-classic". Preliminary correlation analysis reveals that text features have varying influence on the text hardness stated by the two groups. We have employed support vector machine (SVM) and support vector regression (SVR) techniques to model the reading difficulties of Bangla texts. In addition to developing different models targeted towards different type of readers, separate combinations of features were tested to evaluate their comparative contributions. Our study establishes that the perception of text difficulty varies largely with the background of the reader. To the best of our knowledge, no such work on text readability has been recorded earlier in Bangla.

Research paper thumbnail of Modelling the Organization and Processing of Bangla Polymorphemic Words in the Mental Lexicon: A Computational Approach

In this paper we try to present psycholinguistically motivated computational model for the access... more In this paper we try to present psycholinguistically motivated computational model for the access and representation of Bangla polymorphemic words in the Mental Lexicon. We first conduct a series of masked priming experiment on a set of Bangla polymorphemic words. Our analysis indicates a significant number of words shows morphological decomposition during the processing stage. We further developed a computational model for the processing of Bangla polymorphemic words. The novelty of the new model over the existing ones are, the proposed model not only considers the frequency of the derived word but also considers the role of its constituent stem, suffix and the degree of affixation between the stem and the suffix. We have evaluated the new model with the results obtained from the priming experiment and then compare it with the state of the art. The proposed model has been found to perform better than the existing models.

Research paper thumbnail of Development of an Online Repository of Bangla Literary Texts and its Ontological Representation for Advance Search Options

Research paper thumbnail of A Complex Network Analysis of Syllables in Bangla through SyllableNet

Research paper thumbnail of Automatic Extraction of Compound Verbs from Bangla Corpora

In this paper we present a rule-based technique for the automatic extraction of Bangla compound v... more In this paper we present a rule-based technique for the automatic extraction of Bangla compound verbs from raw text corpora. In our work we have (a) proposed rules through which a system could automatically identify Bangla CVs from texts. These rules will be established on the basis of syntactic interpretation of sentences, (b) we shall explain problems of CV identification subject to the semantics and pragmatics of Bangla language, (c) finally, we have applied these rules on two different Bangla corpuses to extract CVs. The extracted CVs were manually evaluated by linguistic experts where our system and achieved an accuracy of around 70%.

Research paper thumbnail of New Readability Measures for Bangla and Hindi Texts

In this paper we present computational models to compute readability of Indian language text docu... more In this paper we present computational models to compute readability of Indian language text documents. We first demonstrate the inadequacy and the consequent inapplicability of some of the popular readability metrics in English to Hindi and Bangla. Next, we present user experiments to identify important structural parameters of Bangla and Hindi that affect readability of texts in these two languages. Accordingly, we propose two different readability models for each Bangla and Hindi. The models are tested against a second round of user studies with completely new set of data. The results validate the propose models. Compared to the handful of existing works in Hindi and Bangla text readability, this paper presents the first ever definitive readability models for these languages incorporating their salient structural features.

Research paper thumbnail of Web browsing interface for people with severe speech and motor impairment in India

Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility - ASSETS '14, 2014

We present design and development of a web browser that allow easy dissemination of information t... more We present design and development of a web browser that allow easy dissemination of information through World Wide Web for people with cerebral palsy in India. Our focus user group comprises people with severe form of spastic cerebral palsy and highly restricted motor movement skills. Throughout the development process we have interacted with the target users to understand their requirements and to get design advises. The browser is augmented with an intelligent auto-scanning mechanism through which the web contents and browser GUI controls can be accessed with less time and effort. We have field tested the browser with the target users where preliminary evaluation results suggests that the proposed browser is quite effective in terms of task execution time, cognitive effort and overall usability.

Research paper thumbnail of Development of accessible toolset to enhance social interaction opportunities for people with cerebral palsy in India

Proceedings of the 16th international ACM SIGACCESS conference on Computers & accessibility - ASSETS '14, 2014

In this paper we have developed a toolset that will allow people with severe spastic cerebral pal... more In this paper we have developed a toolset that will allow people with severe spastic cerebral palsy (CP) and highly restricted motor movement skills to access popular social-networking and communication mediums like, Facebook and E-mails. To understand the requirements of the intended users we have performed a number of surveys that acted as basis of our system design. The developed tools use special access switch based scanning technique for easy navigation in different applications. We have evaluated the toolset with six target users. The preliminary results demonstrate a positive response.

Research paper thumbnail of How Word Order Affects Sentence Comprehension in Bangla: A Computational Approach to Simple Sentence

Lecture Notes in Computer Science, 2013

Sentence comprehension is an integral and important part of whole text comprehension. It involves... more Sentence comprehension is an integral and important part of whole text comprehension. It involves complex cognitive actions, as a reader has to work through lexical, syntactic and semantic aspects in order to understand a sentence. One of the vital features of a sentence is word order or surface forms. Different languages have evolved different systems of word orders, which reflect the cognitive structure of the native users of that language. Therefore, word order affects the cognitive load exerted by a sentence as experienced by the reader. Computational modeling approach to quantify the effect of word order on difficulty of sentence understanding can provide a great advantage in study of text readability and its applications. Handful of works has done in English and other languages to address the issue. Bangla, which is the fifth mostly spoken languages in the world and a relatively free word order language, still does not have any computational model to quantify the reading difficulty of a sentence. In this paper, we have developed models to predict the comprehending difficulty of a simple sentence according to its different surface forms in Bangla. In the course of action, we have also established that difficulty measures for English do not hold in Bangla. Our model has been validated against an extensive user survey.

Research paper thumbnail of A Joint Source Channel Model for the English to Bengali Back Transliteration

Lecture Notes in Computer Science, 2013

In this paper we present an English-to-Bengali back transliteration system that can be used to tr... more In this paper we present an English-to-Bengali back transliteration system that can be used to transliterate Bengali texts written in Romanized English, back to its original script. Our proposed system uses a bilingual parallel corpus of English-Bengali transliterated word pairs and applies both the orthographic as well as phonetic information to two different computational models namely, the joint source channel model and the trigram model, to automatically identify, extract and learning of transliteration unit (TU) pairs from both the source and target language words. Finally, the system predicts the top 10 best possible outcome of the given input text. We further extend our work to make the target word prediction module more robust. This is done by the phonological analysis of the generated target sentence. Both the models have been evaluated with a set of 2000 Romanized Bengali test words. Our initial evaluation results clearly shows that the joint source channel model performs much better than the trigram model.

Research paper thumbnail of Design and Development of a Bangla Semantic Lexicon and Semantic Similarity Measure

International Journal of Computer Applications, 2014

In this paper, we have proposed a hierarchically organized semantic lexicon in Bangla and also a ... more In this paper, we have proposed a hierarchically organized semantic lexicon in Bangla and also a graph based edgeweighting approach to measure semantic similarity between two Bangla words. We have also developed a graphical user interface to represent the lexical organization. Our proposed lexical structure contains only relations based on semantic association. We have included the frequency of each word over five Bangla corpuses in our lexical structure and also associated more details to words such as, whether the words are mythological or not, whether it can be used as verb or not, in order to use the word as a verb which word should be appended to it etc. As we have earlier discussed, this lexicon can be used in various applications like categorization, semantic web, and natural language processing applications like, document clustering, word sense disambiguation, machine translation, information retrieval, text comprehension and question-answering systems.

Research paper thumbnail of Computational Modeling of Morphological Effects in Bangla Visual Word Recognition

Journal of Psycholinguistic Research, 2014

In this paper we aim to model the organization and processing of Bangla polymorphemic words in th... more In this paper we aim to model the organization and processing of Bangla polymorphemic words in the mental lexicon. Our objective is to determine whether the mental lexicon accesses a polymorphemic word as a whole or decomposes the word into its constituent morphemes and then recognize them accordingly. To address this issue, we adopted two different strategies. First, we conduct a masked priming experiment over native speakers. Analysis of reaction time (RT) and error rates indicates that in general, morphologically derived words are accessed via decomposition process. Next, based on the collected RT data we have developed a computational model that can explain the processing phenomena of the access and representation of Bangla derivationally suffixed words. In order to do so, we first explored the individual roles of different linguistic features of a Bangla morphologically complex word and observed that processing of Bangla morphologically complex words depends upon several factors like, the base and surface word frequency, suffix type/token ratio, suffix family size and suffix productivity. Accordingly, we have proposed different feature models. Finally, we combine these feature models together and came up with a new model that takes the advantage of the individual feature models and successfully explain the processing phenomena of most of the Bangla morphologically derived words. Our proposed model shows an accuracy of around 80 % which outperforms the other related frequency models.