Apurbalal Senapati - Academia.edu (original) (raw)

Papers by Apurbalal Senapati

Research paper thumbnail of A study on hydrodynamics of rigid and emergent vegetated flows using machine learning approach

Innovations in Systems and Software Engineering

Research paper thumbnail of A Fuzzy System for Identifying Partial Reduplication

Research paper thumbnail of Proceedings of Intelligent Computing and Technologies Conference

Research paper thumbnail of Effects of Cartoon Network of School Going Children: An Empirical Study

Design Science and Innovation, 2021

Research paper thumbnail of A Maximum Entropy Based Honorificity Identification for Bengali Pronominal Anaphora Resolution

Lecture Notes in Computer Science, 2014

This paper presents a maximum entropy based method for determining honorific identities of person... more This paper presents a maximum entropy based method for determining honorific identities of personal nouns in Bengali. Later this information is used for pronoun anaphora resolution system for Bengali as honorificity plays an important role for pronominal anaphora resolution in Bengali. Experiment has done on a publicly available dataset. Experimental result shows that when the module for honorific identification is added to the existing pronoun resolution system, the accuracy avg. F1-score of the system is improved from 0.602 to 0.703 and this improvement is shown to be statistically significant.

Research paper thumbnail of Automatic detection of subject/object drops in Bengali

2014 International Conference on Asian Language Processing (IALP), 2014

This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominan... more This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.

Research paper thumbnail of One-expression classification in Bengali and its role in Bengali-English machine translation

2014 International Conference on Asian Language Processing (IALP), 2014

This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine... more This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.

Research paper thumbnail of A unified deep neuro-fuzzy approach for COVID-19 twitter sentiment classification

Journal of Intelligent & Fuzzy Systems

Covid-19 braces serious mental health crisis across the world. Since a vast majority of the popul... more Covid-19 braces serious mental health crisis across the world. Since a vast majority of the population exploit social media platforms such as twitter to exchange information, rapid collecting and analyzing social media data to understand personal well-being and subsequently adopting adequate measures could avoid severe socio-economic damage. Sentiment analysis on twitter data is very useful to understand and identify the mental health issues. In this research, we proposed a unified deep neuro-fuzzy approach for Covid-19 twitter sentiment classification. Fuzzy logic has been a very powerful tool for twitter data analysis where approximate semantic and syntactic analysis is more relevant because correcting spelling and grammar in tweets are merely obnoxious. We conducted the experiment on three challenging COVID-19 twitter sentiment datasets. Experimental results demonstrate that fuzzy Sugeno integral based ensembled classifiers succeed over individual base classifiers.

Research paper thumbnail of Foreign Animation and Indian Kids Behavior: An Innovative Survey

Design Science and Innovation

Research paper thumbnail of CIT Kokrajhar Team: LSTM based Deep RNN Architecture for Hate Speech and Offensive Content (HASOC) Identification in Indo-European Languages

Recently, automated hate speech and offensive content identification has received significant att... more Recently, automated hate speech and offensive content identification has received significant attention due to rapid propagation of cyberbullying which undermines objective discussions in social media and adversely affects the outcome of the online social democratic processes. A special type of Recurrent Neural Network (RNN) based deep learning approach called Long Short Term Memory (LSTM) is implemented for automatic hate speech and offensvie content identification. Separating offensive content is quite challenging because the abusive language is quite subjective in nature and highly context dependent. This paper offers language-agnostic solution in three Indo-European languages (English, German, and Hindi) since no pre-trained word embedding is used. Experimental results offer very attractive insights.

Research paper thumbnail of Hate Speech and Offensive Content Identification: LSTM Based Deep Learning Approach @ HASOC 2020

The use of hate speech and offensive words is growing around the world. It includes the way of ex... more The use of hate speech and offensive words is growing around the world. It includes the way of expression in vocal or written form that attacks an individual or a community based on their caste, religion, gender, ethnic groups, physical appearance, etc. The popular social media like Twitter, Facebook, WhatsApp. Print media and visual media are being exploited as a platform for hate speech and offensive and increasingly found in the web. It is a serious matter for a healthy democracy, social stability, and peace. As a consequence, the social media platforms are trying to identify such content in the post for their preventing measure. FIRE 2020 organizes a track aiming to develop a system that will identify hate speech and offensive content in the document. In our system we (CONCORDIA_CIT_TEAM) have used the Long Short Term Memory (LSTM) for automatic hate speech and offensive content identification. Experimental results demonstrate that LSTM can successfully identify hate speech and ...

Research paper thumbnail of An Overview of the Basic NLP Resources Towards Building the Assamese-English Machine Translation

Proceedings of Intelligent Computing and Technologies Conference, 2021

Machine Translation (MT) is the process of automatically converting one natural language into ano... more Machine Translation (MT) is the process of automatically converting one natural language into another, preserving the exact meaning of the input text to the output text. It is one of the classical problems in the Natural Language Processing (NLP) domain and there is a wide application in our daily life. Though the research in MT in English and some other language is relatively in an advanced stage, but for most of the languages, it is far from the human-level performance in the translation task. From the computational point of view, for MT a lot of preprocessing and basic NLP tools and resources are needed. This study gives an overview of the available basic NLP resources in the context of Assamese-English machine translation.

Research paper thumbnail of Hiencor: On mining of a hi-en general purpose parallel corpus from the web

2017 International Conference on Asian Language Processing (IALP), 2017

This paper presents a language independent and simple methodology to mine bilingual parallel corp... more This paper presents a language independent and simple methodology to mine bilingual parallel corpus from the web. In particular, we extract parallel corpus for the Hindi-English (Hi-En) language pair from web pages which are previously unexplored. Candidate websites containing Hindi and English pages are identified by using a list of Hindi stop words to the system. A small set of manually generated patterns and a state of the art sentence aligner is then used to extract Hindi-English parallel corpus from these candidate websites. The quality of the mined parallel corpus is also demonstrated empirically in Hindi-English machine translation task.

Research paper thumbnail of Technical Domain Classification of Bangla Text using BERT

Proceedings of Intelligent Computing and Technologies Conference, 2021

Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in ... more Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in NLP, and these tasks are done on document and sentence levels. Here, our goal is to identify the technical domain of a given Bangla text. In Coarse-grained technical domain classification, such a piece of the Bangla text provides information about specific Coarse-grained technical domains like Biochemistry (bioche), Communication Technology (com-tech), Computer Science (cse), Management (mgmt), Physics (phy) Etc. This paper uses a recent deep learning model called the Bangla Bidirectional Encoder Representations Transformers (Bangla BERT) mechanism to identify the domain of a given text. Bangla BERT (Bangla-Bert-Base) is a pretrained language model of the Bangla language. Later, we discuss the Bangla BERT accuracy and compare it with other models that solve the same problem.

Research paper thumbnail of Is There Any Further Scope for Improving the Efficiency of Modern Websites?

2021 6th International Conference for Convergence in Technology (I2CT), 2021

Interface is an intermediate medium of interaction between human and computer for effective commu... more Interface is an intermediate medium of interaction between human and computer for effective communication. In recent years, with the advancement of WWW, efficient user interface design has become evident for better user-experiences. Efficiency is used as a key component for determining the key metric of a website-Usability. The purpose of this study is to evaluate the efficiency of different websites, popularly used in our life. In order to carry out this task, ten websites were considered. For each website, a representative task was selected. Ten participants were used to complete this task in the Latin Square method. Experimental results show that most of the interfaces are efficient. However, there is a further scope for improving efficiency of those websites.

Research paper thumbnail of Named-Entity Recognition in Bengali @FIRE NER 2013

This paper describes performance of two systems for Named Entity Recognition (NER) task of FIRE 2... more This paper describes performance of two systems for Named Entity Recognition (NER) task of FIRE 2013. The first system is a rule-based one whereas the second one is statistical (based on CRF) in nature. The systems vary in some other aspects too, for example, the first system works on untagged data (not even POS tag is done) to identify NER whereas the second system makes use of a POS tagger and a chunker. The rules used by the first system are mined from the training data. The CRF-based classification does not require any explicit linguistic rules but it uses a gazetteer built from Wiki and other sources.

Research paper thumbnail of Stock Price Prediction: LSTM Based Model

Proceedings of Intelligent Computing and Technologies Conference, 2021

Stock price prediction is a critical field used by most business people and common or retail peop... more Stock price prediction is a critical field used by most business people and common or retail people who tried to increase their money by value with respect to time. People will either gain money or loss their entire life savings in stock market activity. It is a chaos system. Building an accurate model is complex as variation in price depends on multiple factors such as news, social media data, and fundamentals, production of the company, government bonds, historical price and country's economics factor. Prediction model which considers only one factor might not be accurate. Hence incorporating multiple factors news, social media data and historical price might increase the model's accuracy. This paper tried to incorporate the issue when someone implements it as per the model outcome. It cannot give the proper result when someone implements it in real life since capital market data is very sensitive and news-driven. To avoid such a situation, we use the hedging concept when ...

Research paper thumbnail of A novel framework for COVID-19 case prediction through piecewise regression in India

International Journal of Information Technology

Outbreak of COVID-19, created a disastrous situation in more than 200 countries around the world.... more Outbreak of COVID-19, created a disastrous situation in more than 200 countries around the world. Thus the prediction of the future trend of the disease in different countries can be useful for managing the outbreak. Several data driven works have been done for the prediction of COVID-19 cases and these data uses features of past data for future prediction. In this study the machine learning (ML)-guided linear regression model has been used to address the different types of COVID-19 related issues. The linear regression model has been fitted into the dataset to deal with the total number of positive cases, and the number of recoveries for different states in India such as Maharashtra, West Bengal, Kerala, Delhi and Assam. From the current analysis of COVID-19 data it has been observed that trend of per day number of infection follows linearly and then increases exponentially. This property has been incorporated into our prediction and the piecewise linear regression is the best suited model to adopt this property. The experimental results shows the superiority of the proposed scheme and to the best of our knowledge this is a new approach towards the prediction of COVID-19.

Research paper thumbnail of Bangla Morphological Analyzer using Finite Automata: ISI @FIRE MET 2012

This paper describes a finite automata based morphological analyzer for Bangla. Based on the MET ... more This paper describes a finite automata based morphological analyzer for Bangla. Based on the MET [1] requirement the analyzer outputs only the root (surface) word but the system has capability to produce full-fledged morphological information. The method can be used for any agglutinative language with minor changes.

Research paper thumbnail of A Machine Learning Approach to Anaphora Resolution in Nepali Language

In this paper, we attempt a machine learning (ML) approach to Anaphora Resolution (AR) system in ... more In this paper, we attempt a machine learning (ML) approach to Anaphora Resolution (AR) system in Nepali language. It is one of the pioneering approaches in anaphora resolution using machine learning in Nepali language, which is a resource-limited language. For this work, we have developed our own data set in the standard format available in this domain. Data has been tagged with the necessary information like Parts-of-speech (POS), Named entity, Chunking information, Gender, Number, Person, etc. We divided the data for training and testing purposes in approximately 5:1 ratio and ML classifiers are used for training and testing. Results show encouraging for further progress.

Research paper thumbnail of A study on hydrodynamics of rigid and emergent vegetated flows using machine learning approach

Innovations in Systems and Software Engineering

Research paper thumbnail of A Fuzzy System for Identifying Partial Reduplication

Research paper thumbnail of Proceedings of Intelligent Computing and Technologies Conference

Research paper thumbnail of Effects of Cartoon Network of School Going Children: An Empirical Study

Design Science and Innovation, 2021

Research paper thumbnail of A Maximum Entropy Based Honorificity Identification for Bengali Pronominal Anaphora Resolution

Lecture Notes in Computer Science, 2014

This paper presents a maximum entropy based method for determining honorific identities of person... more This paper presents a maximum entropy based method for determining honorific identities of personal nouns in Bengali. Later this information is used for pronoun anaphora resolution system for Bengali as honorificity plays an important role for pronominal anaphora resolution in Bengali. Experiment has done on a publicly available dataset. Experimental result shows that when the module for honorific identification is added to the existing pronoun resolution system, the accuracy avg. F1-score of the system is improved from 0.602 to 0.703 and this improvement is shown to be statistically significant.

Research paper thumbnail of Automatic detection of subject/object drops in Bengali

2014 International Conference on Asian Language Processing (IALP), 2014

This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominan... more This paper presents a pioneering attempt for automatic detection of drops in Bengali. The dominant drops in Bengali refer to subject, object and verb drops. Bengali is a pro-drop language and pro-drops fall under subject/object drops which this research concentrates on. The detection algorithm makes use of off-the-shelf Bengali NLP tools like POS tagger, chunker and a dependency parser. Simple linguistic rules are initially applied to quickly annotate a dataset of 8,455 sentences which are then manually checked. The corrected dataset is then used to train two classifiers that classify a sentence to either one with a drop or no drop. The features previously used by other researchers have been considered. Both the classifiers show comparable overall performance. As a by-product, the current study generates another (apart from the drop-annotated dataset) useful NLP resource, i.e. classification of Bengali verbs (all morphological variants of 881 root verbs) as per their transitivity which in turn used as a feature by the classifiers.

Research paper thumbnail of One-expression classification in Bengali and its role in Bengali-English machine translation

2014 International Conference on Asian Language Processing (IALP), 2014

This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine... more This paper attempts to analyze one-expressions in Bengali and shows its effectiveness for machine translation. The characteristics of one-expressions are studied in 177 million word corpus. A classification scheme has been proposed for the grouping the one-expressions. The features contributing towards the classification are identified and a CRF-based classifier is trained on an authors' generated annotated dataset containing 2006 instances of one-expressions. The classifier's performance is tested on a test set (containing 300 instances of Bengali one-expressions) which is different from the training data. Evaluation shows that the classifier can correctly classify the one-expressions in 75% cases. Finally, the utility of this classification task is investigated for machine translation (Bengali-English). The translation accuracy is improved from 39% (by Google translator) to 60% (by the proposed approach) and this improvement is found to be statistically significant. All the annotated datasets (there was none before) are made free to facilitate further research on this topic.

Research paper thumbnail of A unified deep neuro-fuzzy approach for COVID-19 twitter sentiment classification

Journal of Intelligent & Fuzzy Systems

Covid-19 braces serious mental health crisis across the world. Since a vast majority of the popul... more Covid-19 braces serious mental health crisis across the world. Since a vast majority of the population exploit social media platforms such as twitter to exchange information, rapid collecting and analyzing social media data to understand personal well-being and subsequently adopting adequate measures could avoid severe socio-economic damage. Sentiment analysis on twitter data is very useful to understand and identify the mental health issues. In this research, we proposed a unified deep neuro-fuzzy approach for Covid-19 twitter sentiment classification. Fuzzy logic has been a very powerful tool for twitter data analysis where approximate semantic and syntactic analysis is more relevant because correcting spelling and grammar in tweets are merely obnoxious. We conducted the experiment on three challenging COVID-19 twitter sentiment datasets. Experimental results demonstrate that fuzzy Sugeno integral based ensembled classifiers succeed over individual base classifiers.

Research paper thumbnail of Foreign Animation and Indian Kids Behavior: An Innovative Survey

Design Science and Innovation

Research paper thumbnail of CIT Kokrajhar Team: LSTM based Deep RNN Architecture for Hate Speech and Offensive Content (HASOC) Identification in Indo-European Languages

Recently, automated hate speech and offensive content identification has received significant att... more Recently, automated hate speech and offensive content identification has received significant attention due to rapid propagation of cyberbullying which undermines objective discussions in social media and adversely affects the outcome of the online social democratic processes. A special type of Recurrent Neural Network (RNN) based deep learning approach called Long Short Term Memory (LSTM) is implemented for automatic hate speech and offensvie content identification. Separating offensive content is quite challenging because the abusive language is quite subjective in nature and highly context dependent. This paper offers language-agnostic solution in three Indo-European languages (English, German, and Hindi) since no pre-trained word embedding is used. Experimental results offer very attractive insights.

Research paper thumbnail of Hate Speech and Offensive Content Identification: LSTM Based Deep Learning Approach @ HASOC 2020

The use of hate speech and offensive words is growing around the world. It includes the way of ex... more The use of hate speech and offensive words is growing around the world. It includes the way of expression in vocal or written form that attacks an individual or a community based on their caste, religion, gender, ethnic groups, physical appearance, etc. The popular social media like Twitter, Facebook, WhatsApp. Print media and visual media are being exploited as a platform for hate speech and offensive and increasingly found in the web. It is a serious matter for a healthy democracy, social stability, and peace. As a consequence, the social media platforms are trying to identify such content in the post for their preventing measure. FIRE 2020 organizes a track aiming to develop a system that will identify hate speech and offensive content in the document. In our system we (CONCORDIA_CIT_TEAM) have used the Long Short Term Memory (LSTM) for automatic hate speech and offensive content identification. Experimental results demonstrate that LSTM can successfully identify hate speech and ...

Research paper thumbnail of An Overview of the Basic NLP Resources Towards Building the Assamese-English Machine Translation

Proceedings of Intelligent Computing and Technologies Conference, 2021

Machine Translation (MT) is the process of automatically converting one natural language into ano... more Machine Translation (MT) is the process of automatically converting one natural language into another, preserving the exact meaning of the input text to the output text. It is one of the classical problems in the Natural Language Processing (NLP) domain and there is a wide application in our daily life. Though the research in MT in English and some other language is relatively in an advanced stage, but for most of the languages, it is far from the human-level performance in the translation task. From the computational point of view, for MT a lot of preprocessing and basic NLP tools and resources are needed. This study gives an overview of the available basic NLP resources in the context of Assamese-English machine translation.

Research paper thumbnail of Hiencor: On mining of a hi-en general purpose parallel corpus from the web

2017 International Conference on Asian Language Processing (IALP), 2017

This paper presents a language independent and simple methodology to mine bilingual parallel corp... more This paper presents a language independent and simple methodology to mine bilingual parallel corpus from the web. In particular, we extract parallel corpus for the Hindi-English (Hi-En) language pair from web pages which are previously unexplored. Candidate websites containing Hindi and English pages are identified by using a list of Hindi stop words to the system. A small set of manually generated patterns and a state of the art sentence aligner is then used to extract Hindi-English parallel corpus from these candidate websites. The quality of the mined parallel corpus is also demonstrated empirically in Hindi-English machine translation task.

Research paper thumbnail of Technical Domain Classification of Bangla Text using BERT

Proceedings of Intelligent Computing and Technologies Conference, 2021

Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in ... more Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in NLP, and these tasks are done on document and sentence levels. Here, our goal is to identify the technical domain of a given Bangla text. In Coarse-grained technical domain classification, such a piece of the Bangla text provides information about specific Coarse-grained technical domains like Biochemistry (bioche), Communication Technology (com-tech), Computer Science (cse), Management (mgmt), Physics (phy) Etc. This paper uses a recent deep learning model called the Bangla Bidirectional Encoder Representations Transformers (Bangla BERT) mechanism to identify the domain of a given text. Bangla BERT (Bangla-Bert-Base) is a pretrained language model of the Bangla language. Later, we discuss the Bangla BERT accuracy and compare it with other models that solve the same problem.

Research paper thumbnail of Is There Any Further Scope for Improving the Efficiency of Modern Websites?

2021 6th International Conference for Convergence in Technology (I2CT), 2021

Interface is an intermediate medium of interaction between human and computer for effective commu... more Interface is an intermediate medium of interaction between human and computer for effective communication. In recent years, with the advancement of WWW, efficient user interface design has become evident for better user-experiences. Efficiency is used as a key component for determining the key metric of a website-Usability. The purpose of this study is to evaluate the efficiency of different websites, popularly used in our life. In order to carry out this task, ten websites were considered. For each website, a representative task was selected. Ten participants were used to complete this task in the Latin Square method. Experimental results show that most of the interfaces are efficient. However, there is a further scope for improving efficiency of those websites.

Research paper thumbnail of Named-Entity Recognition in Bengali @FIRE NER 2013

This paper describes performance of two systems for Named Entity Recognition (NER) task of FIRE 2... more This paper describes performance of two systems for Named Entity Recognition (NER) task of FIRE 2013. The first system is a rule-based one whereas the second one is statistical (based on CRF) in nature. The systems vary in some other aspects too, for example, the first system works on untagged data (not even POS tag is done) to identify NER whereas the second system makes use of a POS tagger and a chunker. The rules used by the first system are mined from the training data. The CRF-based classification does not require any explicit linguistic rules but it uses a gazetteer built from Wiki and other sources.

Research paper thumbnail of Stock Price Prediction: LSTM Based Model

Proceedings of Intelligent Computing and Technologies Conference, 2021

Stock price prediction is a critical field used by most business people and common or retail peop... more Stock price prediction is a critical field used by most business people and common or retail people who tried to increase their money by value with respect to time. People will either gain money or loss their entire life savings in stock market activity. It is a chaos system. Building an accurate model is complex as variation in price depends on multiple factors such as news, social media data, and fundamentals, production of the company, government bonds, historical price and country's economics factor. Prediction model which considers only one factor might not be accurate. Hence incorporating multiple factors news, social media data and historical price might increase the model's accuracy. This paper tried to incorporate the issue when someone implements it as per the model outcome. It cannot give the proper result when someone implements it in real life since capital market data is very sensitive and news-driven. To avoid such a situation, we use the hedging concept when ...

Research paper thumbnail of A novel framework for COVID-19 case prediction through piecewise regression in India

International Journal of Information Technology

Outbreak of COVID-19, created a disastrous situation in more than 200 countries around the world.... more Outbreak of COVID-19, created a disastrous situation in more than 200 countries around the world. Thus the prediction of the future trend of the disease in different countries can be useful for managing the outbreak. Several data driven works have been done for the prediction of COVID-19 cases and these data uses features of past data for future prediction. In this study the machine learning (ML)-guided linear regression model has been used to address the different types of COVID-19 related issues. The linear regression model has been fitted into the dataset to deal with the total number of positive cases, and the number of recoveries for different states in India such as Maharashtra, West Bengal, Kerala, Delhi and Assam. From the current analysis of COVID-19 data it has been observed that trend of per day number of infection follows linearly and then increases exponentially. This property has been incorporated into our prediction and the piecewise linear regression is the best suited model to adopt this property. The experimental results shows the superiority of the proposed scheme and to the best of our knowledge this is a new approach towards the prediction of COVID-19.

Research paper thumbnail of Bangla Morphological Analyzer using Finite Automata: ISI @FIRE MET 2012

This paper describes a finite automata based morphological analyzer for Bangla. Based on the MET ... more This paper describes a finite automata based morphological analyzer for Bangla. Based on the MET [1] requirement the analyzer outputs only the root (surface) word but the system has capability to produce full-fledged morphological information. The method can be used for any agglutinative language with minor changes.

Research paper thumbnail of A Machine Learning Approach to Anaphora Resolution in Nepali Language

In this paper, we attempt a machine learning (ML) approach to Anaphora Resolution (AR) system in ... more In this paper, we attempt a machine learning (ML) approach to Anaphora Resolution (AR) system in Nepali language. It is one of the pioneering approaches in anaphora resolution using machine learning in Nepali language, which is a resource-limited language. For this work, we have developed our own data set in the standard format available in this domain. Data has been tagged with the necessary information like Parts-of-speech (POS), Named entity, Chunking information, Gender, Number, Person, etc. We divided the data for training and testing purposes in approximately 5:1 ratio and ML classifiers are used for training and testing. Results show encouraging for further progress.