The Tesserae Project: Intertextual Analysis of Latin Poetry (poster) (original) (raw)

NOTE: THIS IS A PRE-PRINT DRAFT VERSION OF: "The Tesserae Project: intertextual analysis of Latin poetry" published in Literary and Linguistic Computing. The published version contains several editorial changes. Interested readers are advised to consult the published version of this paper

2012

Tesserae is a web-based tool for automatically detecting allusions in Latin poetry. Although still in the start-up phase, it already is capable of identifying significant numbers of known allusions, as well as similar numbers of allusions previously unnoticed by scholars. In this paper we use the tool to examine allusions to Vergil's Aeneid in the first book of Lucan's Civil War. Approximately 3,000 linguistic parallels returned by the program were compared with a list of known allusions drawn from commentaries. Each was examined individually and graded for its literary significance, in order to benchmark the program's performance. All allusions from the program and commentaries were then pooled in order to examine broad patterns in Lucan's allusive techniques which were largely unapproachable without digital methods. While Lucan draws relatively constantly from Vergil's generic language in order to maintain the epic idiom, this baseline is punctuated by clusters of pointed allusions, in which Lucan frequently subverts or distorts Vergil's original meaning. These clusters not only attend the most significant characters and events, but also play a role in structuring scene transitions. We are working to incorporate the ability to match on word meaning, phrase context, as well as metrical and phonological features, into future versions of the program.

The Tesserae Project: Intertextual Analysis of Latin Poetry

Literary and Linguistic Computing, 2013

Tesserae is a web-based tool for automatically detecting allusions in Latin poetry. Although still in the start-up phase, it already is capable of identifying significant numbers of known allusions, as well as similar numbers of allusions previously unnoticed by scholars. In this article, we use the tool to examine allusions to Vergil's Aeneid in the first book of Lucan's Civil War. Approximately 3,000 linguistic parallels returned by the program were compared with a list of known allusions drawn from commentaries. Each was examined individually and graded for its literary significance, in order to benchmark the program's performance. All allusions from the program and commentaries were then pooled in order to examine broad patterns in Lucan's allusive techniques which were largely unapproachable without digital methods. Although Lucan draws relatively constantly from Vergil's generic language in order to maintain the epic idiom, this baseline is punctuated by clusters of pointed allusions, in which Lucan frequently subverts Vergil's original meaning. These clusters not only attend the most significant characters and events but also play a role in structuring scene transitions. Work is under way to incorporate the ability to match on word meaning, phrase context, as well as metrical and phonological features into future versions of the program.

Automatic Detection of Reuses and Citations in Literary Texts

Literary and Linguistic Computing

For more than forty years now, modern theories of literature (Compagnon, 1979) insist on the role of paraphrases, rewritings, citations, reciprocal borrowings and mutual contributions of any kinds. The notions of intertextuality, transtextuality, hypertextuality/hypotextuality, were introduced in the seventies and eighties to approach these phenomena. The careful analysis of these references is of particular interest in evaluating the distance that the creator voluntarily introduces with his/her masters. Phoebus is collaborative project that makes computer scientists from the University Pierre and Marie Curie (LIP6-UPMC) collaborate with the literary teams of Paris-Sorbonne University with the aim to develop efficient tools for literary studies that take advantage of modern computer science techniques. In this context, we have developed a piece of software that automatically detects and explores networks of textual reuses in classical literature. This paper describes the principles ...

An Automated Approach to Model the Transformation Process of the Reuse of Bernard de Clairvaux: How Do Lexical Resources help?

2017

To fortify the research of automated, historical text reuse detection, it is necessary to investigate the way in which a text is reused (e.g., verbatim, paraphrased) in order to understand the broader context of a reuse. Our long-term goal is to build a formal theory behind reuse transformations. We have previously investigated two datasets of Bible reuse to analyze how reuse is modified and how linguistic resources support this. In this work, we investigate the ratio of non-literal text reuse, and we measure to which extent the Ancient Greek WordNet—which also contains Latin WordNet— and BabelNet can support identifying lexical relations in Latin reuse excerpts. In doing so, we also show the lack and need of resources for ancient data.

Modeling the Scholars: Detecting Intertextuality through Enhanced Word-Level N-Gram Matching

Literary and Linguistic Computing (LLC), 2014

The study of intertextuality, or how authors make artistic use of other texts in their works, has a long tradition, and has in recent years benefited from a variety of applications of digital methods. This paper describes an approach to detecting the sorts of intertexts that literary scholars have found most meaningful, as embodied in the free Tesserae website http://tesserae.caset.buffalo.edu/. Tests of Tesserae Versions 1 and 2 showed that word-level n-gram matching could recall a majority of parallels identified by scholarly commentators in a benchmark set. But these versions lacked precision, so that the meaningful parallels could only be found among long lists of those that were not meaningful. The Version 3 search described here adds a second stage scoring system that sorts found parallels by a formula accounting for word frequency and phrase density. Testing against a benchmark set of intertexts in Latin epic poetry shows that the scoring system overall succeeds in ranking parallels of greater significance more highly, allowing site users to find meaningful parallels more quickly. Users can also choose to adjust recall and precision by focusing only on results above given score levels. As a theoretical matter, these tests establish that lemma identity, word frequency, and phrase density are important constituents of what make a phrase parallel a meaningful intertext.

Enjambment Detection in a Large Diachronic Corpus of Spanish Sonnets

Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 2017

Enjambment takes place when a syntactic unit is broken up across two lines of poetry, giving rise to different stylistic effects. In Spanish literary studies, there are unclear points about the types of stylistic effects that can arise, and under which linguistic conditions. To systematically gather evidence about this, we developed a system to automatically identify enjambment (and its type) in Spanish. For evaluation, we manually annotated a reference corpus covering different periods. As a scholarly corpus to apply the tool, from public HTML sources we created a diachronic corpus covering four centuries of sonnets (3750 poems), and we analyzed the occurrence of enjambment across stanzaic boundaries in different periods. Besides, we found examples that highlight limitations in current definitions of enjambment.

Enjambment Detection in a Large Diachronic Corpus of Spanish Sonnets (LaTeCH-CLFL 2017)

Enjambment takes place when a syntactic unit is broken up across two lines of poetry, giving rise to different stylistic effects. In Spanish literary studies, there are unclear points about the types of stylistic effects that can arise, and under which linguistic conditions. To systematically gather evidence about this, we developed a system to automatically identify enjamb-ment (and its type) in Spanish. For evaluation , we manually annotated a reference corpus covering different periods. As a scholarly corpus to apply the tool, from public HTML sources we created a di-achronic corpus covering four centuries of sonnets (3750 poems), and we analyzed the occurrence of enjambment across stan-zaic boundaries in different periods. Besides , we found examples that highlight limitations in current definitions of en-jambment.

Distant Rhythm: Automatic Enjambment Detection On Four Centuries Of Spanish Sonnets

2017

Enjambment takes place when a syntactic unit is broken up across two lines of poetry, giving rise to different stylistic effects. In Spanish literary studies, detailed case-studies of the phenomenon based on single authors exist. However, a larger-scale study spanning hundreds of major and minor authors, across several centuries, is not available so far. Towards that need, we have developed software based on Natural Language Processing (NLP), to automatically identify enjambment (and its type) in Spanish. To evaluate the system, we manually annotated two reference corpora (one diachronic, one from the 20th century). Results are satisfactory for the system's first version, with F1 varying depending on period and enjambment type. As a scholarly corpus to apply the tool, from public HTML sources we created a diachronic corpus covering four centuries of sonnets (3750 poems). We applied the tool to analyze the occurrence of enjambment across stanzaic boundaries in different periods.

Non-Literal Text Reuse in Historical Texts: An Approach to Identify Reuse Transformations and its Application to Bible Reuse

Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016

Text reuse refers to citing, copying or alluding text excerpts from a text resource to a new context. While detecting reuse in contemporary languages is well supported-given extensive research, techniques, and corporaautomatically detecting historical text reuse is much more difficult. Corpora of historical languages are less documented and often encompass various genres, linguistic varieties, and topics. In fact, historical text reuse detection is much less understood and empirical studies are necessary to enable and improve its automation. We present a linguistic analysis of text reuse in two ancient data sets. We contribute an automated approach to analyze how an original text was transformed into its reuse, taking linguistic resources into account to understand how they help characterizing the transformation. It is complemented by a manual analysis of a subset of the reuse. Our results show the limitations of approaches focusing on literal reuse detection. Yet, linguistic resources can effectively support understanding the non-literal text reuse transformation process. Our results support practitioners and researchers working on understanding and detecting historical reuse.

ViS-À-ViS: Detecting Similar Patterns in Annotated Literary Text

IEEE Visualization Conference / VIS4DH Papers, 2020

We present a web-based system called ViS-À-ViS aiming to assist literary scholars in detecting repetitive patterns in an annotated tex-tual corpus. Pattern detection is made possible using distant reading visualizations that highlight potentially interesting patterns. In addition , the system uses time-series alignment algorithms, and in particular, dynamic time warping (DTW), to detect patterns automatically. We present a case-study where an ancient Hebrew poetry corpus was manually annotated with figurative language devices such as metaphors and similes and then loaded into the system. Preliminary results confirm the effectiveness of the system in analyzing the annotated data and in detecting literary patterns and similarities.