A new approach for extracting the conceptual schema of texts based on the linguistic Thematic Progression theory (original) (raw)

Thematic progression in English literary and legislative texts

Master Thesis, 2016

The focus of the current research is on Thematic Progression of English literary and legislative texts. The research aims at identifying and describing the Thematic Progression patterns characteristic of English literary and legislative texts. This study aims at exploring Theme-Rheme structure which plays an important role in organizing and analyzing discourse. Thematic progression patterns deal with the set of meaningful sense groups which constitute a coherent discourse. Daneš (1974) was first who introduced Thematic Progression and presented it as a skeleton of the plot. He proposed four Thematic Progression types: Simple Linear thematic progression, Constant thematic progression, thematic progression with Derived Themes and Split Rheme. Thematic realization makes to cohesive development of the text, thus Thematic Progression patterns can describe a specific genre. The research was conducted utilizing two approaches, that is the data was analyzed qualitatively and quantitatively. The qualitative functional analysis of the texts was carried out with the focus on the thematic-rhematic sequences in text sentences to establish Thematic Progression patterns. The contrastive analysis was employed to highlight the similarities and differences in the use of Thematic Progression patters in the English literary and legislative texts. Finally, the calculation of the relative frequency of occurrence of Thematic Progression patterns in two types of texts revealed the characteristic features of ordering ideas in two different genres of English texts. The research demonstrated that the Thematic Progression characteristics can be claimed to be genre specific.

Text Schema Mining Using Graphs and Formal Concept Analysis

2002

This paper presents an investigation into finding and evaluating schemata through formal concept analysis. Schemata are used in conceptual authoring support to provide proven building blocks of text structures. As still only few schemata are available, ways to mine them from structures of existing texts seem worthwhile. The general process begins with the structure of a text as a graph, transforms this into a formal context and examines the formal concept lattice for this context. Especially formal concepts with large extents may be candidates for schemata. Three alternative kinds of transformations are presented: Wille’s Natural transformation produces contexts mainly based on type and connection information, Schema-derived transformations derive of attributes that identify partial or complete instances from a set of schemata, Informal: Starting from a set of schemata, manually formulate conditions that may be present in the instance graph and contribute to the presence of such schemata. We have regarded document structures consisting of a hierarchy of sections and subsections, which may import and export topics. The topics are interconnected in a conceptual graph called the topic map. Results of processing two such structures with the natural transformation and an informal one are reported. Some notes on the implementation in the Chasid prototype are given.

Combining Formal Concept Analysis and semantic information for building ontological structures from texts: an exploratory study

This work studies conceptual structures based on the Formal Concept Analysis method. We build these structures based on lexicosemantic information extracted from texts, among which we highlight the semantic roles. In our research, we propose ways to include semantic roles in concepts produced by this formal method. We analyze the contribution of semantic roles and verb classes in the composition of these concepts through structural measures. In these studies, we use the Penn Treebank Sample and SemLink 1.1 corpora, both in English.

Towards a Computational Model to Thematic Typology of Literary Texts: A Concept Mining Approach

International Journal of Advanced Computer Science and Applications, 2021

In recent years, computational linguistic methods have been widely used in different literary studies where they have been proved useful in breaking into the mainstream of literary critical scholarship as well as in addressing different inherent challenges that were long associated with literary studies. Such computational approaches have revolutionized literary studies through their potentials in dealing with large datasets. They have bridged the gap between literary studies and computational and digital applications through the integration of these applications including most notably data mining in reconsidering the way literary texts are analyzed and processed. As thus, this study seeks to use the potentials of computational linguistic methods in proposing a computational model that can be usefully used in the thematic typologies of literary texts. The study adopts concept mining methods using semantic annotators for generating a thematic typology of the literary texts and exploring their thematic interrelationships through the arrangement of texts by topic. The study takes the prose fiction texts of Thomas Hardy as an example. Findings indicated that concept mining was usefully used in extracting the distinctive concepts and revealing the thematic patterns within the selected texts. These thematic patterns would be best described in these categories: class conflict, Wessex, religion, female suffering, and social realities. It can be finally concluded that computational approaches as well as scientific and empirical methodologies are useful adjuncts to literary criticism. Nevertheless, conventional literary criticism and human reasoning are also crucial and irreplaceable by computer-assisted systems.

Thematic Annotation: extracting concepts out of documents

Arxiv preprint cs/0412117, 2004

Abstract: Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of--possibly statistical--keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database--the EDR Electronic Dictionary--that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document ...

ThemePro: A Toolkit for the Analysis of Thematic Progression

Language Resources and Evaluation, 2020

This paper introduces ThemePro, a toolkit for the automatic analysis of thematic progression. Thematic progression is relevant to natural language processing (NLP) applications dealing, among others, with discourse structure, argumentation structure, natural language generation, summarization and topic detection. A web platform demonstrates the potential of this toolkit and provides a visualization of the results including syntactic trees, hierarchical thematicity over propositions and thematic progression over whole texts.

Thematic Analysis and Visualization of Textual Corpus

2011

The semantic analysis of documents is a domain of intense research at present. The works in this domain can take several directions and touch several levels of granularity. In the present work we are exactly interested in the thematic analysis of the textual documents. In our approach, we suggest studying the variation of the theme relevance within a text to identify the major theme and all the minor themes evoked in the text. This allows us at the second level of analysis to identify the relations of thematic associations in a textual corpus. Through the identification and the analysis of these association relations we suggest generating thematic paths allowing users, within the frame work of information search system, to explore the corpus according to their themes of interest and to discover new knowledge by navigating in the thematic association relations.

Framework for Conceptual Modeling on Natural Language Texts

2016

The paper presents the framework for conceptual modeling which has been used in on-going project of developing fact extraction technology on textual data. The modeling technique combines the usage of conceptual graphs and Formal Concept Analysis. Conceptual graphs serve as semantic models of text sentences and the data source for formal context of concept lattice. Several ways of creating formal contexts on a set of conceptual graphs have been investigated and resulting solution is proposed. It is based on the analysis of the use cases of semantic roles applied in conceptual graphs and their structural patterns. Concept lattice building on textual data is interpreted as storage of facts which can be extracted by using navigation in the lattice and interpretation its concepts and hierarchical links between them. Experimental investigation of the modeling technique was performed on the annotated textual corpus consisted of descriptions of biotopes of bacteria.

Thematic Progressions Of The

2015

Coherence is one of the characteristic of good academic writing, including abstract that represent the whole content of research article in order to be able to show what messages want to be expressed in the abstract. This study investigated the English abstract TEFLIN in applied linguistisc written by Indonesian speaker in its coherence by analyzing the theme and rheme. This study focuses on (1) identifying the theme which are dominantly used in 2015 TEFLIN article abstracts written by non-native speaker of English (2) identifying thematic progressions which are dominantly used in 2015 TEFLIN article abstracts written by non-native speaker of English (3) finding out coherences' quality of the research article abstract section in 2105 TEFLIN article abstracts in applied linguistic based on thematic progression. This study was designed by using descriptive qualitative. The results show that the type of theme dominantly used in 2015 TEFLIN article abstracts written by non-native sp...