Nasrin Taghizadeh | University of Tehran (original) (raw)

Papers by Nasrin Taghizadeh

Cornell University - arXiv, Mar 19, 2020

NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. This task was chosen to com... more NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. This task was chosen to compare different approaches to find phrases that specify Named Entities in Farsi texts, and to establish a standard testbed for future researches on this task in Farsi. This paper describes the process of making training and test data, a list of participating teams (6 teams), and evaluation results of their systems. The best system obtained 85.4% of F 1 score based on phrase-level evaluation on seven classes of NEs including person, organization, location, date, time, money and percent.

Semantic Relation Extraction aims to identify whether a semantic relation of pre-defined types is... more Semantic Relation Extraction aims to identify whether a semantic relation of pre-defined types is held between two entities in a text. Relation extraction is a preliminary task in many applications such as knowledge base construction and information retrieval. To investigate the challenges and opportunities of relation extraction in Persian, we run a shared task as part of the second workshop on NLP Solutions for Under-Resourced Languages (NSURL 2021). This paper presents the approaches of the participating teams, their results, and the finding of the shared task. The data set prepared for this task is made publicly available1 to support further researches on Persian relation extraction.

Abstract files of SNPPhenA corpus. (ZIP 651 kb)

NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. This task was chosen to com... more NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. This task was chosen to compare different approaches to find phrases that specify Named Entities in Farsi texts, and to establish a standard testbed for future researches on this task in Farsi. This paper describes the process of making training and test data, the list of participating teams (6 teams), and evaluation results of their systems. The best system obtained 85.4% of F1 score based on phrase-level evaluation on seven classes of NEs including person, organization, location, date, time, money, and percent.

ArXiv, 2021

We have released SINA-BERT, a language 001 model pre-trained on BERT (Devlin et al., 002 2018) to... more We have released SINA-BERT, a language 001 model pre-trained on BERT (Devlin et al., 002 2018) to address the lack of a high-quality Per003 sian language model in the medical domain. 004 SINA-BERT utilizes pre-training on a large005 scale corpus of medical contents including for006 mal and informal texts collected from various 007 online resources in order to improve the perfor008 mance on health-care related tasks. We employ 009 SINA-BERT to complete following representa010 tive tasks: categorization of medical questions, 011 medical sentiment analysis, medical named en012 tity recognition, and medical question retrieval. 013 For each task, we have developed Persian anno014 tated data sets for training and evaluation and 015 learnt a representation for the data of each task 016 especially complex and long medical questions. 017 With the same architecture being used in each 018 task, SINA-BERT outperforms BERT-based 019 models that were previously made available in 020 the Persian l...

Journal of Biomedical Semantics, 2017

Background: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic ... more Background: Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations. Method: In this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks. Result: The agreement between annotators was measured by Cohen's Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639. Conclusion: Specifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations. Trial Registration: Not Applicable

Robotics and Autonomous Systems, 2013

h i g h l i g h t s • A framework for automatic skill acquisition. • Two algorithms for subgoal d... more h i g h l i g h t s • A framework for automatic skill acquisition. • Two algorithms for subgoal detection: the first algorithm incorporates k ′-means algorithm with spectral graph clustering, and the second algorithm utilizes eigenvector centrality measure.

ACM Transactions on Asian and Low-Resource Language Information Processing

We describe a cross-lingual adaptation method based on syntactic parse trees obtained from the Un... more We describe a cross-lingual adaptation method based on syntactic parse trees obtained from the Universal Dependencies (UD), which are consistent across languages, to develop classifiers in low-resource languages. The idea of UD parsing is to capture similarities as well as idiosyncrasies among typologically different languages. In this article, we show that models trained using UD parse trees for complex NLP tasks can characterize very different languages. We study two tasks of paraphrase identification and relation extraction as case studies. Based on UD parse trees, we develop several models using tree kernels and show that these models trained on the English dataset can correctly classify data of other languages, e.g., French, Farsi, and Arabic. The proposed approach opens up avenues for exploiting UD parsing in solving similar cross-lingual tasks, which is very useful for languages for which no labeled data is available.

Computer Speech & Language

Journal of Artificial Intelligence Research

‎Wordnets are an effective resource for natural language processing and information retrieval‎, ‎... more ‎Wordnets are an effective resource for natural language processing and information retrieval‎, ‎especially for semantic processing and meaning related tasks‎. ‎So far‎, ‎wordnets have been constructed for many languages‎. ‎However‎, ‎the automatic development of wordnets for low-resource languages has not been well studied‎. ‎In this paper‎, ‎an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poor-resource languages‎. ‎The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bi-lingual dictionary and a mono-lingual corpus‎. ‎The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments‎. ‎The results show that the induced wordnet has a precision score of 90% and a recall score of 35%‎.

Procedia Computer Science

Applied Soft Computing, 2017

Journal of Artificial Intelligence Research

Cellular learning automata (CLA) is a distributed computational model which was introduced in the... more Cellular learning automata (CLA) is a distributed computational model which was introduced in the last decade. This model combines the computational power of the cellular automata with the learning power of the learning automata. Cellular learning automata is composed from a lattice of cells working together to accomplish their computational task; in which each cell is equipped with some learning automata. Wide range of applications utilizes CLA such as image processing, wireless networks, evolutionary computation and cellular networks. However, the only input to this model is a reinforcement signal and so it cannot receive another input such as the state of the environment. In this paper, we introduce a new model of CLA such that each cell receives extra information from the environment in addition to the reinforcement signal. The ability of getting an extra input from the environment increases the computational power and flexibility of the model. We have designed some new algorithms for solving famous problems in pattern recognition and machine learning such as classification, clustering and image segmentation. All of them are based on the proposed CLA. We investigated performance of these algorithms through several computer simulations. Results of the new clustering algorithm shows acceptable performance on various data sets. CLA-based classification algorithm gets average precision 84% on eight data sets in comparison with SVM, KNN and Naive Bayes with average precision 88%, 84% and 75%, respectively. Similar results are obtained for semi-supervised classification based on the proposed CLA.

Wordnets are an effective resource for natural language processing and information retrieval, esp... more Wordnets are an effective resource for natural language processing and information retrieval, especially for semantic processing and meaning related tasks. So far, wordnets have been constructed for many languages. However, the automatic development of wordnets for low-resource languages has not been well studied. In this paper, an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poor-resource languages. The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bilingual dictionary and a mono-lingual corpus. The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments. The results show that the induced wordnet has a precision score of 90% and a recall score of 35%.

Recent researches on automatic skill acquisition in reinforcement learning have focused on subgoa... more Recent researches on automatic skill acquisition in reinforcement learning have focused on subgoal
discovery methods. Among them, algorithms based on graph partitioning have achieved higher
performance. In this paper, we propose a new automatic skill acquisition framework based on graph
partitioning approach. The main steps of this framework are identifying subgoals and discovering useful
skills. We propose two subgoal discovery algorithms, which use spectral analysis on the transition
graph of the learning agent. The first proposed algorithm, incorporates k′-means algorithm with spectral
clustering. In the second algorithm, eigenvector centrality measure is utilized and options are discovered.
Moreover, we propose an algorithm for pruning useless options, which cause additional costs for the
learning agent. The experimental results on various problems show significant improvement in the
learning performance of the agent

Nowadays developing autonomous systems, which can act in various environments and interactively p... more Nowadays developing autonomous systems, which can
act in various environments and interactively perform
their assigned tasks, are intensively desirable. These
systems would be ready to be applied in different
fields such as medicine, controller robots and social
life. Reinforcement learning is an attractive area of machine
learning which addresses these concerns. In large
scales, learning performance of an agent can be improved
by using hierarchical Reinforcement Learning
techniques and temporary extended actions. The higher
level of abstraction helps the learning agent approach
lifelong learning goals. In this paper a new method is
presented for discovering subgoal states and constructing
useful skills. The method utilizes Ant System optimization
algorithm to identify bottleneck edges, which
act like bridges between different connected areas of the
problem space. Using discovered subgoals, the agent
creates temporal abstractions, which enable it to explore
more effectively. Experimental Results show that the
proposed method can significantly improve the learning
performance of the agent.

Cornell University - arXiv, Mar 19, 2020

Abstract files of SNPPhenA corpus. (ZIP 651 kb)

NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. This task was chosen to com... more NSURL-2019 Task 7 focuses on Named Entity Recognition (NER) in Farsi. This task was chosen to compare different approaches to find phrases that specify Named Entities in Farsi texts, and to establish a standard testbed for future researches on this task in Farsi. This paper describes the process of making training and test data, the list of participating teams (6 teams), and evaluation results of their systems. The best system obtained 85.4% of F1 score based on phrase-level evaluation on seven classes of NEs including person, organization, location, date, time, money, and percent.

ArXiv, 2021

Journal of Biomedical Semantics, 2017

Robotics and Autonomous Systems, 2013

ACM Transactions on Asian and Low-Resource Language Information Processing

Computer Speech & Language

Journal of Artificial Intelligence Research

Procedia Computer Science

Applied Soft Computing, 2017

Journal of Artificial Intelligence Research