Anaphora Resolution Research Papers - Academia.edu (original) (raw)
This paper presents the application of Hobbs algorithm for pronominal resolution in Hindi. Hobb's algorithm makes use of syntactic information rather than semantic information can thus be used as baseline algorithm. The algorithm has been... more
This paper presents the application of Hobbs algorithm for pronominal resolution in Hindi. Hobb's algorithm makes use of syntactic information rather than semantic information can thus be used as baseline algorithm. The algorithm has been adapted for Hindi language taking into account the roles of subject, object and its impact on anaphora resolution for reflexive and possessive pronouns. In case, if subject is dative, then possessive and reflexive pronouns loose the complimentary distribution, where reflexive pronoun binds with preceding nominative as well as dative noun phrase within a sentence, possessive pronoun extends the binding to the previous sentence as well.
This dissertation explores Subject Pronoun Expression (SPE) in Cabo-Verdean Creole (CVC), a Portuguese-based language spoken in the Republic of Cabo Verde. The CVC subject domain has at least three types of nominative anaphora: a subject... more
This dissertation explores Subject Pronoun Expression (SPE) in Cabo-Verdean Creole (CVC), a Portuguese-based language spoken in the Republic of Cabo Verde. The CVC subject domain has at least three types of nominative anaphora: a subject clitic, a null subject, and a double-subject construction. This study is the first to examine the distribution of these subject categories by combining a quantitative methodology with formal syntactic theory, as well as insights from functionalist, usage-based, cognitive linguistic, and typological approaches. In so doing, it offers a new perspective on this issue that is intended to move the field past protracted theoretical debates over the morphosyntactic status and discursive functions of these grammatical elements. For instance, the formal category underlying subject clitics has been contested in CVC and cross-linguistically; some have claimed that they are independent pronouns that cliticize at the phonological level (Déprez 1994; De Cat 2005; Costa & Pratas 2013), others have identified them as inflectional affixes in the VP layer (DeGraff 1993; Baptista 1995; Culbertson 2010), while in language typology they are analyzed as ‘person markers’ that can engage in local grammatical agreement or nonlocal anaphoric agreement (Bresnan & Mchombo 1987; Zribi-Hertz & Diagne 2002; Siewierska 2004; Creissels 2005; Kari 2017). Sociolinguistic interviews and picture description narratives were collected from native speakers of CVC from the islands of Santiago and Maio. Sampled speech was transcribed prosodically (Chafe 1993; Du Bois et al. 1993; Torres Cacoullos & Travis 2019) in order to evaluate several aspects of discourse organization. Data were submitted to descriptive and inferential inspection in four analyses using R (R Core Team 2019): one was an exploratory test that served to delimit the variable context for SPE in CVC, the second involved a fixed-effects multinomial logistic regression, and the third and fourth were based on mixed-effects binomial logistic regressions. Results revealed highly significant effects for linguistic structural priming: double-subject and singleton tonic pronouns primed subsequent double-subjects, while null subjects primed additional null subjects. Lexical Determiner Phrase (DP) antecedents that were semantically referentially deficient (i.e. they bore inanimate, indefinite, or nonspecific reference) also promoted anaphoric zeros. These results lend partial support to the claims regarding the semantic properties of strong pronominals proposed under the Typology of Structural Deficiency (Cardinaletti & Starke 1994, 1999), and suggest that, as in Brazilian Portuguese, there is an “avoid referentially deficient pronoun” constraint (Kato & Duarte 2003, 2005; Duarte & Soares da Silva 2016) that is probabilistically active in CVC. The zero-to-zero priming effect and the favoring effect from referentially deficient lexical DPs were only active at short anaphoric distances, and were promoted when adjacent intonational units were prosodically linked or simultaneously prosodically and syntactically linked (Torres Cacoullos & Travis 2019). The priming effect for double-subjects obtained at longer anaphoric distances; they are promoted when their antecedent is in a non-adjacent clause. Results suggest that double-subjects function as switch-reference devices, can establish contrastive focus, and reintroduce old discourse referents. These are much the same functional and discursive values that singleton tonic pronouns have cross-linguistically (Givón 1976; 2001[1984]; 2017). The realization of zero subjects is mostly contingent on antecedent accessibility (Givón 1976; 2017, Ariel 1990), but is also modulated by the aforementioned “avoid referentially deficient pronoun” constraint. Inferring from the results for zero and double-subjects, it appears that CVC subject clitics are ‘ambiguous person agreement markers’ (Bresnan & Mchombo 1987; Siewierska 2004): like independent pronouns, they engage in nonlocal anaphoric agreement, but like inflectional affixes, they also engage in local grammatical agreement. This in-between morphosyntactic status is related to the infinitival origin of CVC verbs (Quint 2008b): the absence of bound person-number inflection is likely to have initiated grammaticalization on tonic pronouns, causing them to be eroded into subject clitics, and eventually become ambiguous person agreement markers, which are probabilistically dropped according to the properties of their controllers and the dynamics of antecedent accessibility. In line with Wratil’s (2011) ‘Null Subject Cycle’, it could be argued that CVC subject clitics are grammatical elements that have stagnated at an early stage of a grammaticalization cline, which entails the transformation of independent pronouns into clitics, and then eventually into bound affixes.
This research contributes to the area of corpus annotation and text mining by developing novel domain specific language resources. Most practical text mining applications restrict their domain. This research restricts the domain to the... more
This research contributes to the area of corpus annotation and text mining by developing novel domain specific language resources. Most practical text mining applications restrict their domain. This research restricts the domain to the Qur’anic Text.
Recognizing and generating textual entailment and paraphrases are regarded as important technologies in a broad range of NLP applications, including, information extraction, summarization, question answering, information retrieval,... more
Recognizing and generating textual entailment and paraphrases are regarded as important technologies in a broad range of NLP applications, including, information extraction, summarization, question answering, information retrieval, machine translation and text generation. Both textual entailment and paraphrasing address relevant aspects of natural language semantics. Entailment is a directional relation between two expressions in which one of them implies the other, whereas paraphrase is a relation in which two expressions convey essentially the same meaning. Indeed, paraphrase can be defined as bi-directional entailment. While it may be debatable how such semantic definitions can be made well-founded, in practice we have already seen evidence that such knowledge is essential for many applications.
Pronouns typically have explicit antecedents in the prior discourse otherwise processing difficulty is experienced. However, it has been argued [Gordon, P. C., & Hendrick, R. (1997). Intuitive knowledge of linguistic co-reference.... more
Pronouns typically have explicit antecedents in the prior discourse otherwise processing difficulty is experienced. However, it has been argued [Gordon, P. C., & Hendrick, R. (1997). Intuitive knowledge of linguistic co-reference. Cognition, 62, 325–370; Gordon, P. C., & Hendrick, R. (1998). The representation and processing of co-reference in discourse. Cognitive Science, 22, 389–424] that when a pronoun appears in a preposed subordinate clause (as in, "Before she began to sing, Susan stood up"), incremental interpretation is suspended and no antecedent is immediately sought, since the pronoun cannot be resolved until the main clause is encountered. We report results from an eye-tracking study showing that on encountering a pronoun that has no prior antecedent (compared to cases where there is an explicit prior antecedent), readers experience immediate difficulty whether or not the pronoun appears in a preposed subordinate clause, suggesting that attempted incremental interpretation is not suspended in these cases.
This paper examines the structure, types, and strategies of reciprocals in Malayalam. Reciprocals in Malayalam are identified and classified into four types. Structure and strategies of each type and the relation of reciprocals with other... more
This paper examines the structure, types, and strategies of reciprocals in Malayalam. Reciprocals in Malayalam are identified and classified into four types. Structure and strategies of each type and the relation of reciprocals with other grammatical categories are examined. Based on the analysis, it is found that reciprocal constructions in Malayalam are constituted by four types of antecedents and two types of exponents. Compounding, reduplication, clausal, and lexical strategies are used to encode reciprocal meaning. Reciprocals in Malayalam are distinct from reflexives. They inflect for cases, reduce the valency of the verb and can function as interrogative reciprocals.
Lingüística XL. El lingüista del siglo XXI 291 I. ¿POR QUÉ LA ELIPSIS?
This paper presents an algorithm for identifying noun phrase antecedents of third person personal pronouns, demonstrative pronouns, reflexive pronouns, and omitted pronouns (zero pronouns) in unrestricted Spanish texts. We define a list... more
This paper presents an algorithm for identifying noun phrase antecedents of third person personal pronouns, demonstrative pronouns, reflexive pronouns, and omitted pronouns (zero pronouns) in unrestricted Spanish texts. We define a list of constraints and preferences for different types of pronominal expressions, and we document in detail the importance of each kind of knowledge (lexical, morphological, syntactic, and statistical) in anaphora resolution for Spanish. The paper also provides a definition for syntactic conditions on Spanish NP-pronoun noncoreference using partial parsing. The algorithm has been evaluated on a corpus of 1,677 pronouns and achieved a success rate of 76.8%. We have also implemented four competitive algorithms and tested their performance in a blind evaluation on the same test corpus. This new approach could easily be extended to other languages such as English,
Two aspects of anaphora in Hittite are discussed in the paper. The first is syntactic means to mark immediate anaphora after first mention. Besides fronting a constituent hosting -a/ma and demonstrative phrases, it is shown that the... more
Two aspects of anaphora in Hittite are discussed in the paper. The first is syntactic means to mark immediate anaphora after first mention. Besides fronting a constituent hosting -a/ma and demonstrative phrases, it is shown that the specific type of anaphora is also marked by the seemingly redundant structure of enclitic pronoun + full NP in its canonical position. It is argued that the parallel syntactic behaviour of all three constructions provides evidence to distinguish some cases of enclitic pronoun + full NP from appositions and to consider them a taxonomically distinct category, clitic doubling.
The second part of the paper deals with non-standard anaphora in relative clauses. It explores the occasional associate anaphoric relation between the relative phrase and its correlate (bridging) in the cross-linguistic perspective. Building upon work of Huggard and Belyaev it is shown that this non-standard anaphora provides evidence that Hittite relative sentences are not standard relative sentences, they rather constitute a separate taxonomic category, correlatives pace Cinque . Along more general lines, the paper substantiates the language specific claim of Belyaev contra Cinque that correlatives are a syntactic category distinct from relative clauses.
Quantifiers are ubiquitous in natural language and, in addition to providing information about quantity, they serve important discourse functions. We outline several theoretical accounts of the functions that quantifiers perform in a... more
Quantifiers are ubiquitous in natural language and, in addition to providing information about quantity, they serve important discourse functions. We outline several theoretical accounts of the functions that quantifiers perform in a discourse and the factors governing their interpretation, focusing on two specific topics that have received substantial attention from researchers working in linguistics and psychology. The first topic concerns the interpretation of pronominal anaphora in different quantification contexts, and we review evidence showing that the focusing effects of positive and negative quantifiers license different patterns of pronominal reference. The second topic concerns the interpretation of quantifiers that function as anaphors in a discourse, and we consider recent experimental evidence in relation to two current and highly influential theories of semantic interpretation.
The paper summarises the work of the Research Group in Computational Linguistics at the University of Wolverhampton towards the production of much needed annotated resources for evaluation and training of anaphora resolution systems. In... more
The paper summarises the work of the Research Group in Computational Linguistics at the University of Wolverhampton towards the production of much needed annotated resources for evaluation and training of anaphora resolution systems. In particular, it describes the annotating tools developed to support the annotation, the corpora annotated and the annotation strategy adopted. Finally, future plans are outlined.
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our... more
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular, we propose an anaphoric annotating tool for Arabic. Anaphoric annotating tool for Arabic has the advantage of automatic detection of Arabic pronouns and allows the human annotator to select several anaphoric pronouns related to the same antecedent. Our aim is to build a real corpus which will be used for anaphora resolution (i.e., either for system training or evaluation).
ii Preface Welcome to the proceedings of the ACL Workshop "The People's Web Meets NLP: Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions, of which 9 are included in these proceedings. We are gratified... more
ii Preface Welcome to the proceedings of the ACL Workshop "The People's Web Meets NLP: Collaboratively Constructed Semantic Resources". The workshop attracted 21 submissions, of which 9 are included in these proceedings. We are gratified by this level of interest. This workshop was motivated by the observation that the NLP community is currently considerably influenced by online resources, which are collaboratively constructed by ordinary users on the Web. In many works, such resources have been used as semantic resources overcoming the knowledge acquisition bottleneck and coverage problems pertinent to conventional lexical semantic resources. The resource that has gained the greatest popularity in this respect so far is Wikipedia. However, the scope of the workshop deliberately exceeded Wikipedia. We are happy that the proceedings include papers on resources such as Wiktionary, Mechanical Turk, or creating semantic resources through online games. This encourages us in our belief that collaboratively constructed semantic resources are of growing interest for the natural language processing community.
This paper proposes that a core semantic property of temporal locating adverbs is the ability (or the lack thereof) to introduce a new time discourse referent. The core data comes from 'that same day' in narrative discourse. I argue that... more
This paper proposes that a core semantic property of temporal locating adverbs is the ability (or the lack thereof) to introduce a new time discourse referent. The core data comes from 'that same day' in narrative discourse. I argue that unlike other previously studied temporal locating adverbs—which introduce a new time discourse referent and relate it to the speech time or a salient time introduced into the discourse context—'that same day' is twice anaphoric, i.e. it retrieves two salient times from the input context without introducing one of its own. Moreover, I argue that the adverb 'currently' is like 'that same day' in not introducing a new time discourse referent. Unlike 'that same day', however, 'currently' has both a deictic and an anaphoric usage analogous to 'on Sunday'. The analysis that I propose is implemented within Compositional Discourse Representation Theory. It illustrates how adverbial meaning can be integrated within a more general theory of temporal interpretation.
The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event was... more
The paper reports on the recent forum RU-EVAL ‒ a new initiative for evaluation of Russian NLP resources, methods and toolkits. The first two events were devoted to morphological and syntactic parsing correspondingly. The third event was devoted to anaphora and coreference resolution. Seven participating IT companies and academic institutions submitted
their results for the anaphora resolution task and three of them presented the results of the coreference resolution task as well. The event was organized in order to estimate the state of the art for this NLP task in Russian and to compare various methods and principles implemented for Russian.
We discuss the evaluation procedure. The anaphora and coreference tasks are specified in the present work. The phenomena taken into consideration are described. We also give a brief outlook of similar evaluation events whose experience we lay upon. In our work we formulate the training and Gold Standard corpora construction guidelines and present the measures used in evaluation.
In this paper, we discuss a data driven approach for Anaphora resolution of three Indian languages: Bengali, Hindi, and Tamil. The work consists of two steps: identifying markables and links. Markable identification is done using... more
In this paper, we discuss a data driven approach for Anaphora resolution of three Indian languages: Bengali, Hindi, and Tamil. The work consists of two steps: identifying markables and links. Markable identification is done using Conditional Random Field. The identifications of links between markables is done using Decision Tree Algorithm. Both the steps are evaluated and shown results in terms of F-Value.
Resumen: Este trabajo presenta una caracterización descriptiva de la intencionalidad en el desempeño de estudiantes de bachillerato al tratar de resolver ambigüedades en contenidos y relaciones de referencia anafórica intra y extratextual... more
Resumen: Este trabajo presenta una caracterización descriptiva de la intencionalidad en el desempeño de estudiantes de bachillerato al tratar de resolver ambigüedades en contenidos y relaciones de referencia anafórica intra y extratextual en sus comentarios escritos de un texto literario. La anáfora es la interrelación gramatical, referencial y de contenido entre un elemento léxico y otro denominado antecedente. Los alumnos escribieron sus textos en clase y en asesorías para propiciar decisiones reflexivas y autorreguladas en las modificaciones de sus escritos. La recopilación de textos involucró el seguimiento de las decisiones textuales para tratar de mejorar la redacción. El método de análisis contempló criterios y expectativas de textualidad como categorías para la clasificación e interpretación. Los resultados muestran que las relaciones anafóricas involucran la cohesión gramatical, los contenidos y la puntuación. Las mayores dificultades fueron la incorporación y la corrección de la información y las ambigüedades en anáforas intratextuales. La intencionalidad se manifestó en la regulación del desempeño y en la experimentación de estrategias compensatorias al no encontrar soluciones a determinada dificultad. Abstract: This paper presents a descriptive characterization of intentionality of high school students' performances, when trying to solve ambiguities in contents and intra and extratextual anaphoric referential relationships in their written comments of a literary text. Anaphora is the grammatical, referential and content interrelationship between a lexical element and another, known as antecedent. The students wrote their texts in class and in counseling sessions to promote reflexive and self-regulated textual decisions when modifying their writings. The compilation of texts involved the follow-up of the textual decisions to try to improve writing. The method of analysis contemplated criteria and expectations as textual categories for classification and interpretation. Results show that anaphora relationships involve grammar cohesion, contents and punctuation. The main difficulties were the incorporation and correction of information and the ambiguities in intra-textual anaphora. Intentionality was manifested in the regulation of performance and in the experimentation of compensatory strategies, when students couldn't find solutions to certain difficulties.
The main aim of this paper is to analyse the e ects of applying pronominal anaphora resolution to Question Answering QA systems. For this task a complete QA system has been implemented. System evaluation measures performance improvements... more
The main aim of this paper is to analyse the e ects of applying pronominal anaphora resolution to Question Answering QA systems. For this task a complete QA system has been implemented. System evaluation measures performance improvements obtained when information that is referenced anaphorically in documents is not ignored.
In the traditional E-type approach to d-(onkey)-pronouns, these pronouns are interpreted as definite descriptions that contain a bound i-(ndividual)-variable (Evans, 1977; Cooper, 1979; Heim, 1990). Elbourne (2001, 2002); Buring (2004)... more
In the traditional E-type approach to d-(onkey)-pronouns, these pronouns are interpreted as definite descriptions that contain a bound i-(ndividual)-variable (Evans, 1977; Cooper, 1979; Heim, 1990). Elbourne (2001, 2002); Buring (2004) have recently proposed a variant of the E-type approach which dispenses with i-variable binding for donkey anaphora resolution altogether, deriving covariance via s-(ituation)-variable binding. However, they still assume that d-pronouns contain descriptive content, while remaining somewhat vague with regard to what the content exactly is. I will clarify this point of vagueness, discussing examples of d-pronouns with multiple antecedents as well as ambiguous donkey-sentences. I will show that postulating descriptive content in d-pronouns is A) not necessary and B) problematic. I. Evans/Cooper proposal: it = “the donkey owned by himx” II. Elbourne/Buring proposal: it = “thes donkey” III. Present proposal: it = “thes entity” I thus argue that d-pronouns ...
Pronouns that do not have explicit antecedents typically cause processing problems. We investigate a specific example in which this may not be the case, as in ‘‘At the interview, they asked really difficult questions,’’ where the plural... more
Pronouns that do not have explicit antecedents typically cause processing problems. We investigate a specific example in which this may not be the case, as in ‘‘At the interview, they asked really difficult questions,’’ where the plural pronoun they has no explicit antecedent, yet is intuitively easy to process. Some unspecified but constrained set of individuals (the interview panel or the company) can be inferred as the referent, but it is not crucial to determine specifically which entities are being referred to. We propose that this contrasts with the processing of singular pronouns (he or she), for which it is necessary to determine a specific referent. We used event-related brain potentials to in- vestigate how readers process the pronoun (they vs. he/she) in these cases. Sentences were placed in a context that either did or did not contain an explicit antecedent for the pronoun. There were two key findings. Firstly, when there was no explicit antecedent, a larger fronto-central positivity was observed 750 msec after pronoun onset for he/she than they, possibly reflecting the additional difficulty involved in establishing a referent for he/she than for they when no explicit referent is available. Secondly, there was a larger N400-like deflection evoked by he/she than they, regardless of whether there was an explicit antecedent for the pronoun. We suggest that this is due to the singular pronouns bringing about a greater integration effort than the plural pronoun. This observation adds to a growing body of research revealing fundamental differences in the way these pronouns are handled by the language processor.
In this paper, we present a Paninian grammar based heuristic model 1 to resolve entity-pronoun references in Hindi dialogue. We explore the use of Paninian based dependency structures as a source of syntactico-semantic information. Our... more
In this paper, we present a Paninian grammar based heuristic model 1 to resolve entity-pronoun references in Hindi dialogue. We explore the use of Paninian based dependency structures as a source of syntactico-semantic information. Our
experiments illustrate that the use of dependency and dialogue structures help to resolve specific types of references. We also show that named entity, discourse information like subtopic boundary and animacy features increase the overall resolution accuracy to 64% for user-user interaction data and 59% for play-story corpora.
- by Darshan agarwal and +1
- •
- Natural Language Processing, Dialogue, Hindi, Anaphora Resolution
The plural pronouns "they" and "them" are used to refer to individuals with unknown gender and when a random allocation of gender is undesirable. Despite this apparently felicitous usage, “singular they/them” should raise processing... more
The plural pronouns "they" and "them" are used to refer to individuals with unknown gender and when a random allocation of gender is undesirable. Despite this apparently felicitous usage, “singular they/them” should raise processing problems under the theory that pronouns seek gender- and number-matched antecedents. Using eye-tracking, we investigated whether there was any processing cost associated with using singular "they/them". There was a clear cost of number incompatibility for "they/them". Thus, although singular "they/them" is in current usage, it does not appear that "they/them" is immediately tolerant of a singular antecedent, though such may be rapidly accommodated. The data are consistent with the search account of pronoun resolution and preserve the semantics of "they/them" as denoting plurality.
The aim of this paper is twofold. On the one hand, it attempts to explore several machine learning models for pronoun resolution in Turkish, a language not sufficiently studied with respect to anaphora resolution and rarely being... more
The aim of this paper is twofold. On the one hand, it attempts to explore several machine learning models for pronoun resolution in Turkish, a language not sufficiently studied with respect to anaphora resolution and rarely being subjected to machine learning experiments. On the other hand, this paper offers an evaluation of the classification performances of the learning models in order to gain insight into the question of how to match a model to the task at hand. In addition to the expected observation that each model should be tuned to an optimum level of expressive power so as to avoid underfitting and overfitting, the results also suggest that non-linear models properly tuned to avoid overfitting outperform linear ones when applied to the data used in our experiments.
In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the... more
In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We do not restrict our study to the classifiers typically used for this task, we have considered others, such as Random Forest or VFI, in order to make a general comparison. We determine the feature vector obtained with our linguistic processing system and we analyze the contribution of different subsets of features, as well as the weight of each feature used in the task.
- by Basilio Sierra and +2
- •
- Machine Learning, Anaphora Resolution, Random Forest
We propose a new method for using anaphoric information in Latent Semantic Analysis (lsa), and discuss its application to develop an lsa-based summarizer which achieves a significantly better performance than a system not using anaphoric... more
We propose a new method for using anaphoric information in Latent Semantic Analysis (lsa), and discuss its application to develop an lsa-based summarizer which achieves a significantly better performance than a system not using anaphoric information, and a better performance by the rouge measure than all but one of the single-document summarizers participating in duc-2002. Anaphoric information is automatically extracted using a new release of our own anaphora resolution system, guitar, which incorporates proper noun resolution. Our summarizer also includes a new approach for automatically identifying the dimensionality reduction of a document on the basis of the desired summarization percentage. Anaphoric information is also used to check the coherence of the summary produced by our summarizer, by a reference checker module which identifies anaphoric resolution errors caused by sentence extraction.
The quality of discourse structure annotations is negatively influenced by the numerous difficulties that occur in the analysis process. In contrast, referential annotation resources are considerably more reliable, given the high... more
The quality of discourse structure annotations is negatively influenced by the numerous difficulties that occur in the analysis process. In contrast, referential annotation resources are considerably more reliable, given the high precision of the existent anaphora resolution systems. We present an approach based on the Veins Theory (Cristea, Ide, Romary, 1998), in which successful reference annotations of texts are exploited in order to improve arbitrary structural analyses; in this way, the large amount of corpora annotated at reference level can be used for the acquisition of discourse structure annotation resources.
Coreference resolution, also known as the process of linking noun phrases (NPs) referring to the same real world entity mentioned in a document, is a difficult and important task in natural language processing. This paper introduces an... more
Coreference resolution, also known as the process of linking noun phrases (NPs) referring to the same real world entity mentioned in a document, is a difficult and important task in natural language processing. This paper introduces an "incremental" unsupervised coreference resolution algorithm that can make the most of the transitive property in a coreference chain as well as the dependencies among noun phrases. These advantages are derived from the observation that the order in which noun phrases are examined is really significant. In our algorithm, at each iteration, we re-rank the order of clustering according to the distinctness based on an entropy estimation. Highly discriminative and confident links between clusters should be processed first to reduce the ambiguity as much as possible and to open up additional useful clues for clustering subsequent hard-to-cluster noun phrases. The experimental evaluation on the MUC-7 corpus demonstrates the advantages over the previous clustering-based algorithm and the competitiveness with previous supervised learning methods.
In this paper we present a machine learning approach to resolve the pronominal anaphora in Basque language. We consider different classifiers in order to find the system that fits best to the characteristics of the language under... more
In this paper we present a machine learning approach to resolve the pronominal anaphora in Basque language. We consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We apply the combination of classifiers which improves results obtained with single classifiers. The main contribution of the paper is the use of bagging having as base classifier a non-soft one for the anaphora resolution in Basque.
- by Klara Ceberio Berger and +1
- •
- Machine Learning, Anaphora Resolution
A new natural language understanding method for disambiguation of difficult pronouns is described. Difficult pronouns are those pronouns for which a level of world or domain knowledge is needed in order to perform anaphoral or other types... more
A new natural language understanding method for disambiguation of difficult pronouns is described. Difficult pronouns are those pronouns for which a level of world or domain knowledge is needed in order to perform anaphoral or other types of resolution. Resolution of difficult pronouns may in some cases require a prior step involving the application of inference to a situation that is represented by the natural language text. A general method is described: it performs entity resolution and pronoun resolution. An extension to the general pronoun resolution method performs inference as an embedded commonsense reasoning method. The general method and the embedded method utilize features of the ROSS representational scheme; in particular the methods use ROSS ontology classes and the ROSS situation model. The overall method is a working solution that solves the following Winograd schemas: a) trophy and suitcase, b) person lifts person, c) person pays detective, and d) councilmen and demonstrators.
Zeyrek, D., Demirşahin, I., Sevdik Çallı, A. B., Ögel Balaban, H. (2010). Bu, şu, o and Their Referent types in Turkish Discourse Bank. In Proceedings of the ICTL2010 15th International Conference on Turkish Linguistics. This paper is... more
Collana di studi linguistici e retorici diretta da Bice Mortara Garavelli 15 Volume pubblicato con i contributi del Dipartimento di Lingue e Letterature straniere e Culture moderne e del Dipartimento di Studi Umanistici (fondi PRIN 2008)... more
Collana di studi linguistici e retorici diretta da Bice Mortara Garavelli 15 Volume pubblicato con i contributi del Dipartimento di Lingue e Letterature straniere e Culture moderne e del Dipartimento di Studi Umanistici (fondi PRIN 2008) dell'Università degli Studi di Torino I volumi pubblicati nella Collana sono sottoposti a un processo di peer review che ne attesta la validità scientifica Edizioni dell'Orso Alessandria È vietata la riproduzione, anche parziale, non autorizzata, con qualsiasi mezzo effettuata, compresa la fotocopia, anche a uso interno e didattico. L'illecito sarà penalmente perseguibile a norma dell'art. 171 della Legge n. 633 del 22.04.41 ISBN 978-88-6274-462-1
Anaphora resolution attempts to determine the correct antecedent of an anaphor (the term pointing back). In what follows, we propose an algorithm for the resolution of anaphoric pronouns that relies on lexical and syntactic knowledge... more
Anaphora resolution attempts to determine the correct antecedent of an anaphor (the term pointing back). In what follows, we propose an algorithm for the resolution of anaphoric pronouns that relies on lexical and syntactic knowledge incorporated in a modular approach based on constraints and preferences. Our objective was to find the correct antecedent to the following subject pronouns (il, ils, elle, elles), object pronouns (l’, le, la, les) and possessive pronouns (son, sa, ses, leur, leurs) in unrestricted texts. We also identify and eliminate pleonastic pronouns and discard candidates appearing in appositions. Moreover we use a focus mechanism to determine salient entities. The algorithm, implemented in Prolog, realizes a success rate of 68%, which was considered a good performance for unrestricted French texts.
In this paper, we discuss a data driven approach for Anaphora resolution of three Indian languages: Bengali, Hindi, and Tamil. The work consists of two steps: identifying markables and links. Markable identification is done using... more
In this paper, we discuss a data driven approach for Anaphora resolution of three Indian languages: Bengali, Hindi, and Tamil. The work consists of two steps: identifying markables and links. Markable identification is done using Conditional Random Field. The identifications of links between markables is done using Decision Tree Algorithm. Both the steps are evaluated and shown results in terms of F-Value.
The Cabo-Verdean Creole (CVC) subject domain has clitic and tonic pronouns that often amalgamate in double subject pronoun constructions; the possibility of a zero-subject and the formal category underlying subject clitics are disputed... more
The Cabo-Verdean Creole (CVC) subject domain has clitic and tonic pronouns that often amalgamate in double subject pronoun constructions; the possibility of a zero-subject and the formal category underlying subject clitics are disputed (Baptista 1995, 2002; Pratas 2004). This article discusses five variable constraints that condition subject expression across three descriptive and inferential analyses of a corpus of speech collected from 33 speakers from Santiago and Maio. Double subject pronoun constructions and zero-subjects were promoted by a persistence effect, though for the former this applied across nonadjacent clauses since double subject pronoun constructions are switch reference and contrastive devices resembling the doubling of agreement suffixes by independent pronouns in languages traditionally classified as pro-drop. Zero-subjects were favored in third-person contexts as previously observed by Baptista and Bayer (2013), and when a semantically referentially deficient (Duarte & Soares da Silva 2016) DP antecedent was in an Intonational Unit that was prosodically and syntactically linked to the Intonational Unit containing the target anaphor (Torres Cacoullos & Travis 2019). Results support reclassification of CVC subject clitics as ambiguous person agreement markers (Siewierska 2004) and suggest that CVC is developing a split-paradigm for person marking and subject expression (Wratil 2009; Baptista & Bayer 2013).
Being distinct from other languages, Turkish has two different reflexive markers ''kendi'' and ''kendisi'. Although both markers refer to third person singular, they cannot be used interchangeably. Especially 'kendisi' attracted so much... more
Being distinct from other languages, Turkish has two different reflexive markers ''kendi'' and ''kendisi'. Although both markers refer to third person singular, they cannot be used interchangeably. Especially 'kendisi' attracted so much attention because of its dual nature as it can be used both locally and non-locally. Nevertheless, there has not been much emphasis on 'kendi' since it has been assumed that 'kendi' can only be locally bound. Furthermore, although the issue of psychological distance (intimacy) between the speaker and the referent has been claimed to have an effect on reflexive selections, there has not been an experimental study designed before to prove this assumption. By taking all of these into consideration, this research aims to test two main issues: whether the anaphor 'kendi' is perceived as a strict local anaphor by native Turkish speakers and how the psychological distance (intimacy) between speaker and referent influences the way Turkish native speakers use anaphors. Within the frame of these research targets, a two-phased experimental design has been developed and applied to 65 participants in total. The age of participants differed between 17-27 years old. The first experiment was a Translation Task, whereas the second experiment was a Forced-Choice Task. After analysis of the first part, it has been concluded that although strict local anaphors are used in English sentences, the participants did not stick to the use of 'kendi' which is supposedly a strict local anaphor. According to the data of the second task, the results did not comply with the literature. Whereas it was expected to see 'kendi' in informal situations and 'kendisi' in formal situations, we concluded that there was no significant difference between the preferences made between 'kendi' and 'kendisi' depending on the T-test analysis.
The aim of this paper is twofold. On the one hand, it attempts to explore several machine learning models for pronoun resolution in Turkish, a language not sufficiently studied with respect to anaphora resolution and rarely being... more
The aim of this paper is twofold. On the one hand, it attempts to explore several machine learning models for pronoun resolution in Turkish, a language not sufficiently studied with respect to anaphora resolution and rarely being subjected to machine learning experiments. On the other hand, this paper offers an evaluation of the classification performances of the learning models in order to gain insight into the question of how to match a model to the task at hand. In addition to the expected observation that each model should be tuned to an optimum level of expressive power so as to avoid underfitting and overfitting, the results also suggest that non-linear models properly tuned to avoid overfitting outperform linear ones when applied to the data used in our experiments. survey of the early theoretical and computational work on anaphora, and for a more recent survey of the major works in the field.