Computational Approaches to Discourse and Document Processing (original) (raw)

Discourse Processing for Text Analysis: Recent Successes, Current Challenges

2019

Computational discourse processing has come a long way in the 10 years since I spoke at ACL’2009 on Discourse: Early problems, current successes, future challenges. Much of this progress can be attributed to the vast amounts of textual data that have become available and to a concomitant weakening of theoretical commitments, so as to be able to use the data in information extraction, sentiment analysis, question answering, etc. Along with weakened commitments to the demands of particular theories, has been a greater willingness to consider what can be learned from textual data and from various forms of annotation, in English and in other languages as well. This paper briefly summarizes (1) changing assumptions about discourse structure; (2) recent work on lexico-syntactic grounding of low-level discourse structure and frameworks for higher-level discourse structure that recognize differences in genre; and (3) suggestions for addressing some of the challenges still facing us. For mor...

Discourse Structure and Computation: Past, Present and Future

2012

The discourse properties of text have long been recognized as critical to language technology, and over the past 40 years, our understanding of and ability to exploit the discourse properties of text has grown in many ways. This essay briefly recounts these developments, the technology they employ, the applications they support, and the new challenges that each subsequent development has raised. We conclude with the challenges faced by our current understanding of discourse, and the applications that meeting these challenges will promote.

Computational Linguistics and Discourse Analysis

The profound use of the computer in discourse analysis must employ a theory of discourse comprehension and production with which to conduct the analysis. Models currently employed in computational linguistics have a semantic basis and are goal-directed. The basic model is an associative cognitive network. The basic inventory of concepts of the system is given in the systemic network, which is organized into paradigmatic, syntagmatic, and componential structures. Since events happen in particular places at particular times, there is also an episodic structure. The gnomonic system defines abstract concepts over episodes. According to Phillips (1975), discourse coherence must be considered on two levels, the episodic and the gnomic. A discourse which engenders episodic and/or gnomonic expectations which are not then fulfilled is incoherent. A lower limit on coherence may be defined as a discourse so ill-formed that it makes no sense even to its creator. The upper limit on coherence is set by the most powerful creative minds. Between the two limits, discourse analysis, from the point of view of the computational linguist, probably requires nothing less than a full-blown computational theory of the human mind. (JB)

IJERT-Discourse Analysis: Representation and Significance in Natural Language Understanding

International Journal of Engineering Research and Technology (IJERT), 2021

https://www.ijert.org/discourse-analysis-representation-and-significance-in-natural-language-understanding https://www.ijert.org/research/discourse-analysis-representation-and-significance-in-natural-language-understanding-IJERTV10IS070026.pdf Discourse analysis is one amongst the applications of Natural Language Processing. Discourse parsing is used for distinguishing the connectedness and specific talk relations among various units in a content. In this paper we are keen on Rhetorical Structure Theory(RST) among different theoretical frameworks of discourse parsing. It is semantically valuable strategy for depicting characteristic writings, portraying their construction in between parts of the content that hold relations with each other. This study is divided into two significant parts. In the initial segment we have examined about discourse segmentation and the subsequent part comprises of RST, rhetorical relations and an illustration of RST. The results demonstrates that discourse parsing will provide a proper solution for discourse analysis.

Discourse Deixis: Reference to Discourse Segments

1988

Computational approaches to discourse understanding have a two-part goal: (1) to identify those aspects of discourse understanding that require process-based accounts, andS(2) to characterize the processes and data structures they involve. To date, in the area of reference, process-hased ac.omnts have been developed for subsequent reference via anaphoric pronouns and reference via definite descriptors. In this paper, I propose and argue for a process-based account of subsequent reference via deiedc expressions. A significant feature of this account is that it attributes distinct mental reality to units of text often called discourse segments, a reality that is distinct from that of the entities deem therein.

Corpus based Discourse Analysis

The Routledge Handbook of Discourse Analysis, 2023

Synergy between corpus linguistics and discourse analysis Discourse analysis covers a vast range of areas and is also one of the least clearly defined fields in applied linguistics. An early conceptualisation is provided in Schiffrin et al. (2001), who define discourse in the following terms: (1) language in use (2) language structure beyond the sentence level, and (3) social practices and ideologies associated with language. Blommaert (2005: 2) notes that, traditionally, discourse has been treated in linguistic terms as 'language-in-use', informing areas such as pragmatics and speech act theory. However, for Blommaert discourse has a wider interpretation as 'languagein-action', i.e., 'meaningful symbolic behaviour', as representing social practices and ideologies. A useful distinction is made by Gee (2005), who defines the 'language-in-use' aspect as 'discourse' (with a little 'd') and the more 'languagein-action' orientation as 'Discourse' (with a capital D), involving not only linguistic practices but other semiotic elements. Discourses are created through recognition work of 'ways with words, actions, beliefs, emotions, values, interactions, people, objects, tools and technologies' that constitute a way of being a member of a particular discourse community (Gee, 2005: 20). Corpus linguistics is a field of enquiry whose essential nature, like that of discourse analysis, has also come under scrutiny. The main contention revolves around 'corpus-driven' vs. 'corpusbased' linguistics and whether corpus linguistics is a theory or a methodology. The 'corpusdriven' approach sees corpus linguistics as essentially a theory, seeking to describe the corpus as comprehensively as possible without being influenced by preconceptions and existing theories (Tognini Bonelli, 2001). The corpus-based approach, on the other hand, views corpus linguistics as a methodology for validating existing descriptions of language and making adjustments where necessary. In spite of these different theoretical positions, corpus linguistics is generally regarded as a methodology, and 'corpus-based' is used as an umbrella term for a range of corpus enquiries, which is the sense adopted in this chapter. Criticisms have been levelled against both corpus linguistic analysis and critical discourse analysis (CDA), with those of the latter applying equally to discourse analysis in general. CDA may rely on a small set of arbitrarily selected texts which lack representativeness (Stubbs, 1996), the analysis may be overly informed by the analyst's subjective preconceptions (Widdowson, 2000) and the approach is mainly qualitative. The main limitations of corpus-based analyses are that the texts present decontextualised examples of language use and the field does not easily 02_9780367473839int-c21_p1-327.indd 126 02_9780367473839int-c21_p1-327.indd 126

Invited Paper: Discourse Structures and Language Technologies

2011

I want to tell a story about computational approaches to discourse structure. Like all such stories, it takes some liberty with actual events and times, but I think stories put things into perspective, and make it easier to understand where we are and how we might progress. Part 1 of the story (Section 2) is the past. Here we see early computational work on discourse structure aiming to assign a simple tree structure to a discourse. At issue was what its internal nodes corresponded to. The debate was fierce, and suggestions that other structures might be more appropriate were ignored or subjected to ridicule. The main uses of discourse structure were text generation and summarization, but mostly in small-scale experiments. Part 2 of the story (Section 3) is the present. We now see different types of discourse structure being recognized, though perhaps not always clearly distinguished. An increasing number of credible efforts are aimed at recognizing these structures automatically, t...

An overview of the approaches and methods for analysing a text from a discursive viewpoint

Onomazein, 2014

Concepción Hernández-Guerra An overview of the approaches and methods for analysing a text from a discursive viewpoint Discourse analysis is a modern discipline that has inherited lots of attributes from text analysis. So broad is the range of texts and so different the perspectives under which these texts can be analyzed that we can easily understand the different fields that discourse analysis as a whole can cover. The aim of this paper is twofold: firstly, to clarify the different terms included in the discourse in order to separate the features of this discipline from others, and, secondly, to offer a general vision of all the methods that can be carried out under this umbrella to take the most of the data to be analyzed.

Towards a Corpus-Lexicographical Discourse Analysis [Winner of Best Student Paper]

Proceedings of EUROPHRAS 2017, London, 2017

This working paper presents the progress made thus far in the development of a corpus-lexicographical approach to discourse analysis, more specifically the application of Hanks' [5, 6] Corpus Pattern Analysis (CPA) procedure to a (critical) discourse analysis task. The theoretical basis of CPA is explained, followed by some practical applications of CPA, namely lexicography and the proposed method of discourse analysis. Examples are taken from an ongoing investigation into the use of 'killing' verbs in contemporary British English, which draws upon two corpora: the British National Corpus (BNC) and the animal-themed 'People', 'Products', 'Pests' and 'Pets' (PPPP) corpus [8]. Preliminary findings suggest that a CPA-assisted, or corpus-lexicographical, discourse analysis is one with a strong theoretical basis, whose transparency and systematicity empowers the analyst to make precise and persuasive arguments.