Visualizing mappings of semantic and syntactic functions (original) (raw)

Developing an XML-based, exploitable linguistic database of the Hebrew text of Gen. 1:1-2:3

University of Pretoria Electronic Theses and Dissertations, 2008

The thesis discusses a series of related techniques that prepare and transform raw linguistic data for advanced processing in order to unveil hidden grammatical patterns. A threedimensional array is identified as a suitable data structure to build a data cube to capture multidimensional linguistic data in a computer's temporary storage facility. It also enables online analytical processing, like slicing, to be executed on this data cube in order to reveal various subsets and presentations of the data. XML is investigated as a suitable mark-up language to permanently store such an exploitable databank of Biblical Hebrew linguistic data. This concept is illustrated by tagging a phonetic transcription of Genesis 1:1-2:3 on various linguistic levels and manipulating this databank. Transferring the data set between an XML file and a threedimensional array creates a stable environment allowing editing and advanced processing of the data in order to confirm existing knowledge or to mine for new, yet undiscovered, linguistic features. Two experiments are executed to demonstrate possible text-mining procedures. Finally, visualisation is discussed as a technique that enhances interaction between the human researcher and the computerised technologies supporting the process of knowledge creation. Although the data set is very small there are exciting indications that the compilation and analysis of aggregate linguistic data may assist linguists to perform rigorous research, for example regarding the definitions of semantic functions and the mapping of these functions onto the syntactic module.

A text-Semantic Study of the Hebrew Bible, Illustrated by Noach and Job

Journal of Biblical Literature, 1994

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Francis I. Andersen and A. Dean Forbes, 2012 Biblical Hebrew Grammar Visualized, Linguistic Studies in Ancient West Semitic 6, Winona Lake: Eisenbrauns, ISBN 978-1-57506-229-7, xvii + 394pp, many tables and figures, Index of Authors, Scripture, and Topics, $69.50

Deleted Journal, 2016

Round-tripping Biblical Hebrew linguistic data

Proceedings of 2007 Information Resources Management Association, International Conference, 2007

In processing language electronically, one can either concentrate on the digital simulation of human understanding and language production, or on the most appropriate way of storing and using existing knowledge. Both are valid and important. This paper falls in the second category, by assuming that it is useful to capture the results of linguistic analyses in well-designed, exploitable, electronic databanks. The paper focuses on the conversion of linguistic data of Genesis 1 between an XML data cube and a multidimensional array structure in Visual Basic 6 in order to facilitate data access and manipulation.

LAF-Fabric: a data analysis tool for Linguistic Annotation Framework with an application to the Hebrew Bible

The Linguistic Annotation Framework (LAF) provides a general, extensible stand-off markup system for corpora. This paper discusses LAF-Fabric, a new tool to analyse LAF resources in general with an extension to process the Hebrew Bible in particular. We first walk through the history of the Hebrew Bible as text database in decennium-wide steps. Then we describe how LAF-Fabric may serve as an analysis tool for this corpus. Finally, we describe three analytic projects/workflows that benefit from the new LAF representation: 1) the study of linguistic variation: extract cooccurrence data of common nouns between the books of the Bible (Martijn Naaijer); 2) the study of the grammar of Hebrew poetry in the Psalms: extract clause typology (Gino Kalkman); 3) construction of a parser of classical Hebrew by Data Oriented Parsing: generate tree structures from the database (Andreas van Cranenburgh).

A Computational Approach to Syntactic Diversity in the Hebrew Bible

Journal of Biblical Text Research , 2019

For more than four decades, the Eep Talstra Centre for Bible and Computer (ETCBC) has been building a richly-annotated linguistic database of the Hebrew Bible. This contribution describes the processes of data creation of this database and its underlying methodological principles. These principles, which can be labeled “bottom-up” and “form-to-function”, stem from a deep concern to do justice to the biblical text itself and to prevent it from being overruled by thematic or theological considerations. The database facilitates the application of computational linguistics and digital humanities to the Hebrew Bible and supports biblical exegesis, Bible translation as well as the study of the Bible as a language corpus. In recent years the ETCBC database has been transformed to an open tool, which can be consulted online and which can be downloaded as a package for anyone who wants to use it for more advanced computational analysis of the Hebrew Bible. A research project on syntactic variation in the Hebrew Bible demonstrated the interaction of presumed data of origin (early versus late texts), genre (e.g. prose or poetry), text type (e.g. narrative and direct speech) and syntactic environment (e.g. main versus subordinate clauses). Regarding the realization of the copula “to be”, for example, it can be observed that the narrative text type and the direct speech sections differ considerably in the alleged early texts of the Bible and that the direct speech in the early corpus shows similarities with the Late Biblical Hebrew corpus. Regarding the complexity of tree structures, it can be observed that changes in the average size of tree structures take place in main clauses, and only later, or not at all, in subordinate clauses. This agrees with a well-known principle in linguistics, the so-called Penthouse Principle, that accounts for the distinction between “innovative” main clauses and “conservative” subordinate clauses. Such distribution patterns, which can only discovered with a computational full corpus analysis, are helpful to get a better understanding of diachronic language development of Classical Hebrew in the intersection of oral and written text transmission.

XML Annotation of Hebrew Elements in Judeo-Arabic Texts

2018

The main aim of this study is to introduce a model of TEI (Text Encoding Initiative) annotation of Hebrew elements in Judeo-Arabic texts, i.e., code switching (CS), borrowing, and Hebrew quotations. This article will provide an introduction to using XML (Extensible Markup Language) to investigate sociolinguistic aspects in medieval Judeo-Arabic texts. Accordingly, it will suggest to what extent using XML is useful for investigating linguistic and sociolinguistic features in the Judeo-Arabic paradigm. To provide an example for how XML annotation could be applied to Judeo-Arabic texts, a corpus of 300 pages selected from three Judeo-Arabic books has been manually annotated using the TEI P5. The annotation covers all instances of CS, borrowing, and Hebrew quotations in that corpus.

A New Methodology for Ascertaining the Semantic Potential of Biblical Hebrew Prepositions

2013

Recently, there has been a mounting realization that lexical semantic analysis of Biblical Hebrew has, on the whole, both historically and characteristically lacked an explicit methodological framework that might facilitate the assess-ment of a lexeme’s range of meaning. This necessarily has resulted in lexica that are underpinned by implicit, intuitive-driven analyses. In light of this lacuna, the present article proposes a methodology for ascertaining the semantic potential of a particular Biblical Hebrew word class, which happens to showcase the versatility of meaning: prepositions. The aim of the proposal is not to provide a better method for the presentation of semantic content (i.e., lexicography), but rather, and more fundamentally, to explore the ways in which this content might more systematically be derived (i.e., lexicology). The proposal is anchored primarily in the cognitive linguistic enterprise and secondarily, in the theory of grammaticalization. Employing insights from the two fields, the author suggests that a replicable and rigorous assessment of a preposition’s semantic potential might be offered. In particular, such assess-ment would afford a well-endowed identification and explanation of the various elements comprising a target lexeme’s semantic network—both its constituent parts and the composite structure as a whole. Specifically, this would be accomplished through the provision of criteria and parameters aimed at determining sense-distinction, -primacy, and -contingency as well as the employment of alternate modes of network-viewing that allow, among other things, the dynamic nature of meaning to surface.