Dorothee Beermann | Norwegian University of Science and Technology (original) (raw)
Uploads
Papers by Dorothee Beermann
...in the form of a handout: This hand-out complements our demo of the TypeCraft (TC) system. We ... more ...in the form of a handout: This hand-out complements our demo of the TypeCraft (TC) system. We would like to raise questions concerning the design and the use of an application that combines online (and off-line) databasing and data annotation with a knowledge-sharing tool. Do we really need all this functionality, and if yes, how can we design a tool like TC in a way that makes it a useful device rather than just another gadget that consumes more of our time than it saves.
Linguistik Aktuell/Linguistics Today, 2017
Infinite Verbform = Kern A° einer AP → semantische Wahl des Status Status I Reiner Infinitiv Die ... more Infinite Verbform = Kern A° einer AP → semantische Wahl des Status Status I Reiner Infinitiv Die Zuschauer mussten gähnen. Anna ließ Otto arbeiten. Otto geht arbeiten. Partizip I die gähnenden Zuschauer Er starrte gähnend an die Decke. Status II Infinitiv mit zu Die Kinder scheinen zu schlafen. Otto hat noch einen Brief zu schreiben. Die Tonerkassette ist zu ersetzen. Partizip I mit zu die zu ersetzende Tonerkassette Status III Partizip II Otto hat gegähnt. Der Zettel ist verschwunden. Der Spion wurde entlarvt. Ich bekomme die Reise bezahlt. Plötzlich kam Julia angerannt. Partizip II der verschwundene Zettel Der Zettel blieb verschwunden. das verliebte Pärchen der entlarvte Spion
White (vol. eds.) and Stephen Wechsler (series ed.).
This is a presentation of a linguistic cloud service called TypeCraft that allows morpheme-to mor... more This is a presentation of a linguistic cloud service called TypeCraft that allows morpheme-to morpheme annotation of linguistic text in the form of Interlinear Glossed Text.
Endangered Languages and New Technologies
ABSTRACT By today, it is unthinkable to consider undertaking language documentation without using... more ABSTRACT By today, it is unthinkable to consider undertaking language documentation without using modern linguistic technologies. This makes it crucial to know which tools are available and to understand their strengths and weaknesses. This chapter discusses the role that online linguistic tools can play in the creation of linguistic data. We discuss the linguistic work flow seen from the perspective of a tool user. It argues that Internet-based tools can assist linguists with data management and that they are well suited to make data from endangered and less-described languages available for further linguistic research. It also presents a methodology demonstrating how one can advance from flat morpheme-level-annotations to linguistic analysis.
Language Resources and Evaluation, 2013
We present a linguistic application that uses web technologies to promote the reuse of research d... more We present a linguistic application that uses web technologies to promote the reuse of research data in the form of Interlinear Glossed Text (IGT), which is a well-established data format within philology and the structural and generative fields of linguistics. Here we present the modules and procedures of the online database TypeCraft. 3 IGT is a sought after commodity in NLP and an integral part of scholarly linguistic work. It not rarely represents the only structured data available for less-resourced or endangered languages. While archiving of structured data from endangered languages is already well on its way [2], the free creation and exchange of linguistic data in the form of linked IGTs still needs to gain in popularity.
Department of Language and Communication Studies
Dynamic language documentation is among the essential tasks of Modern Linguistics and one of its ... more Dynamic language documentation is among the essential tasks of Modern Linguistics and one of its central concerns. It explores and redefines the borderlines between 'field linguistics', computational linguistics, and theoretical linguistic research, and if we were to name just one important aspect of the enterprise we would point to the potential that lies in the combination of traditional field methods with new technologies. With TypeCraft we present a project that focuses on the documentary and exploratory mode of research. Its ...
Texas Linguistics Society 10: Computational Linguistics for Less-Studied Languages. Nicholas Gayl... more Texas Linguistics Society 10: Computational Linguistics for Less-Studied Languages. Nicholas Gaylord, Stephen Hilderbrand, Heeyoung Lyu, Alexis Palmer and Elias Ponvert eds. Copyright © 2008. CSLI Publications 1 ... Figure 1: Home of TypeCraft - www.typecraft.org ... Modern language documentation is among the essential tasks of linguistics. It redefines the borderlines between 'field linguistics', 'computational lin- ... The build-up of linguistically annotated language data faces two problems. The first problem concerns data preservation, including archiving, ...
Creating Infrastructure for Canonical Typology. Conference hosted by the Surrey Morphological Group. Surrey University. England, 2009
of Linguistics, Oct 1, 2008
In a manuscript from 1987, William Labov questions the relation between quantitative and qualitat... more In a manuscript from 1987, William Labov questions the relation between quantitative and qualitative methods in linguistics:“…, the number, variety and complexity of linguistic relations are very great, and it is not likely that a large proportion can be investigated by quantitative means. At present, we do not know the correct balance between the two modes of analysis: how far we can go with unsupported qualitative analysis based on introspection, before the proposals must be confirmed by quantitative studies based on observation and ...
Verb valence information can be derived from corpora by using subcorpora of typical sentences tha... more Verb valence information can be derived from corpora by using subcorpora of typical sentences that are constructed in a language independent manner based on frequent POS structures. The inspection of typical sentences with a fixed verb in a certain position can show the valence information directly. Using verb fingerprints, consisting of the most typical sentence patterns the verb appears in, we are able to identify standard valence patterns and compare them against a language’s valence profile. With a very limited number of training data per language, valence information for other verbs can be derived as well. Based on the Norwegian valence patterns we are able to find comparative patterns in German where typical sentences are able to express the same situation in an equivalent way and can so construct verb valence pairs for a bilingual PolyVal dictionary. This contribution discusses this application with a focus on the Norwegian valence dictionary NorVal.
...in the form of a handout: This hand-out complements our demo of the TypeCraft (TC) system. We ... more ...in the form of a handout: This hand-out complements our demo of the TypeCraft (TC) system. We would like to raise questions concerning the design and the use of an application that combines online (and off-line) databasing and data annotation with a knowledge-sharing tool. Do we really need all this functionality, and if yes, how can we design a tool like TC in a way that makes it a useful device rather than just another gadget that consumes more of our time than it saves.
Linguistik Aktuell/Linguistics Today, 2017
Infinite Verbform = Kern A° einer AP → semantische Wahl des Status Status I Reiner Infinitiv Die ... more Infinite Verbform = Kern A° einer AP → semantische Wahl des Status Status I Reiner Infinitiv Die Zuschauer mussten gähnen. Anna ließ Otto arbeiten. Otto geht arbeiten. Partizip I die gähnenden Zuschauer Er starrte gähnend an die Decke. Status II Infinitiv mit zu Die Kinder scheinen zu schlafen. Otto hat noch einen Brief zu schreiben. Die Tonerkassette ist zu ersetzen. Partizip I mit zu die zu ersetzende Tonerkassette Status III Partizip II Otto hat gegähnt. Der Zettel ist verschwunden. Der Spion wurde entlarvt. Ich bekomme die Reise bezahlt. Plötzlich kam Julia angerannt. Partizip II der verschwundene Zettel Der Zettel blieb verschwunden. das verliebte Pärchen der entlarvte Spion
White (vol. eds.) and Stephen Wechsler (series ed.).
This is a presentation of a linguistic cloud service called TypeCraft that allows morpheme-to mor... more This is a presentation of a linguistic cloud service called TypeCraft that allows morpheme-to morpheme annotation of linguistic text in the form of Interlinear Glossed Text.
Endangered Languages and New Technologies
ABSTRACT By today, it is unthinkable to consider undertaking language documentation without using... more ABSTRACT By today, it is unthinkable to consider undertaking language documentation without using modern linguistic technologies. This makes it crucial to know which tools are available and to understand their strengths and weaknesses. This chapter discusses the role that online linguistic tools can play in the creation of linguistic data. We discuss the linguistic work flow seen from the perspective of a tool user. It argues that Internet-based tools can assist linguists with data management and that they are well suited to make data from endangered and less-described languages available for further linguistic research. It also presents a methodology demonstrating how one can advance from flat morpheme-level-annotations to linguistic analysis.
Language Resources and Evaluation, 2013
We present a linguistic application that uses web technologies to promote the reuse of research d... more We present a linguistic application that uses web technologies to promote the reuse of research data in the form of Interlinear Glossed Text (IGT), which is a well-established data format within philology and the structural and generative fields of linguistics. Here we present the modules and procedures of the online database TypeCraft. 3 IGT is a sought after commodity in NLP and an integral part of scholarly linguistic work. It not rarely represents the only structured data available for less-resourced or endangered languages. While archiving of structured data from endangered languages is already well on its way [2], the free creation and exchange of linguistic data in the form of linked IGTs still needs to gain in popularity.
Department of Language and Communication Studies
Dynamic language documentation is among the essential tasks of Modern Linguistics and one of its ... more Dynamic language documentation is among the essential tasks of Modern Linguistics and one of its central concerns. It explores and redefines the borderlines between 'field linguistics', computational linguistics, and theoretical linguistic research, and if we were to name just one important aspect of the enterprise we would point to the potential that lies in the combination of traditional field methods with new technologies. With TypeCraft we present a project that focuses on the documentary and exploratory mode of research. Its ...
Texas Linguistics Society 10: Computational Linguistics for Less-Studied Languages. Nicholas Gayl... more Texas Linguistics Society 10: Computational Linguistics for Less-Studied Languages. Nicholas Gaylord, Stephen Hilderbrand, Heeyoung Lyu, Alexis Palmer and Elias Ponvert eds. Copyright © 2008. CSLI Publications 1 ... Figure 1: Home of TypeCraft - www.typecraft.org ... Modern language documentation is among the essential tasks of linguistics. It redefines the borderlines between 'field linguistics', 'computational lin- ... The build-up of linguistically annotated language data faces two problems. The first problem concerns data preservation, including archiving, ...
Creating Infrastructure for Canonical Typology. Conference hosted by the Surrey Morphological Group. Surrey University. England, 2009
of Linguistics, Oct 1, 2008
In a manuscript from 1987, William Labov questions the relation between quantitative and qualitat... more In a manuscript from 1987, William Labov questions the relation between quantitative and qualitative methods in linguistics:“…, the number, variety and complexity of linguistic relations are very great, and it is not likely that a large proportion can be investigated by quantitative means. At present, we do not know the correct balance between the two modes of analysis: how far we can go with unsupported qualitative analysis based on introspection, before the proposals must be confirmed by quantitative studies based on observation and ...
Verb valence information can be derived from corpora by using subcorpora of typical sentences tha... more Verb valence information can be derived from corpora by using subcorpora of typical sentences that are constructed in a language independent manner based on frequent POS structures. The inspection of typical sentences with a fixed verb in a certain position can show the valence information directly. Using verb fingerprints, consisting of the most typical sentence patterns the verb appears in, we are able to identify standard valence patterns and compare them against a language’s valence profile. With a very limited number of training data per language, valence information for other verbs can be derived as well. Based on the Norwegian valence patterns we are able to find comparative patterns in German where typical sentences are able to express the same situation in an equivalent way and can so construct verb valence pairs for a bilingual PolyVal dictionary. This contribution discusses this application with a focus on the Norwegian valence dictionary NorVal.