The Creagest Project: a Digitized and Annotated Corpus for French Sign Language (LSF) and Natural Gestural Languages (original) (raw)
Related papers
Towards a corpus of French Belgian Sign language discourses (Meurant & Sinte) 2013
Recent advances in technology, tools and methodologies, which have facilitated the gathering and annotating of an extensive body of digital film footage, have opened the way to the study of discourse in sign languages (SL), which has remained relatively unexplored so far. This paper gives an account of what the previous experiences revealed about the specificities and issues of collecting SL corpora, with regard to data, participants and the annotation process. It then presents the project that has recently been initiated for the collection of a large-scale corpus of French Belgian Sign Language (LSFB) discourses.
Moving Heads and Moving Hands: Developing a Digital Corpus of Irish Sign Language.
Ireland,< i, 2006
This paper outlines the establishment of the first digital corpus of Irish Sign Language (ISL) using a software programme called ELAN. The Signs of Ireland comprises 40 signers making it the largest digital annotated corpus of a signed language in Europe. This paper describes the way in which such software enhances sign linguistic research, and outlines some of the limitations that arise, in great part, because of the lack of a standardized notation system for signed languages, because of the need for human consistency when working on annotation, and the fact that you will 'get out what you put in' when working with a digital corpus: that is, the decisions made regarding the annotations influence analysis results.
This paper outlines the establishment of the first digital corpus of Irish Sign Language using a software programme called ELAN. The Signs of Ireland comprises 40 signers making it the largest digital annotated corpus of a signed language in Europe. This paper describes the way in which such software enhances sign linguistic research, and outlines some of the limitations that arise, in great part, because of the lack of a standardized notation system for signed languages, because of the need for human consistency when working on annotation, and the fact that you will 'get out what you put in' when working with a digital corpus: that is, the decisions made regarding the annotations influence analysis results.
Towards a corpus of French Belgian Sign Language (LSFB) discourses
Meurant, Laurence & Aurélie Sinte. 2013. Towards a corpus of French Belgian Sign Language (LSFB) discourses. In C. Bolly & L. Degand (eds). Text-Structuring. Across the Line of Speech and Writing Variation. Louvain-la-Neuve: Presses Universitaires de Louvain.
Recent advances in technology, tools and methodologies, which have facilitated the gathering and annotating of an extensive body of digital film footage, have opened the way to the study of discourse in sign languages (SL), which has remained relatively unexplored so far. This paper gives an account of what the previous experiences revealed about the specificities and issues of collecting SL corpora, with regard to data, participants and the annotation process. It then presents the project that has recently been initiated for the collection of a large-scale corpus of French Belgian Sign Language (LSFB) discourses. Keywords: sign language, French Belgian Sign Language, corpus linguistics, discourse
The French Belgian Sign Language Corpus A User-Friendly Searchable Online Corpus
This paper presents the first large-scale corpus of French Belgian Sign Language (LSFB) available via an open access website (www.corpus-lsfb.be). Visitors can search within the data and the metadata. Various tools allow the users to find sign language video clips by searching through the annotations and the lexical database, and to filter the data by signer, by region, by task or by keyword. The website includes a lexicon linked to an online LSFB dictionary.
Issues underlying a common Sign Language Corpora annotation scheme
Corpus-based Sign Language linguistics has emerged as a new linguistic domain, and as a consequence large-scale and controlled video data repositories are under construction for different Sign Languages. Nevertheless, as pointed by (Johnston, 2008) no unified annotation scheme is yet available, which compromises any chance of comparing or reusing corpora across research teams. Another related issue is the comparability of descriptions and formalizations between SL linguistics and mainstream linguistics. In this paper, we address the issue of the definition of a common annotation scheme for Sign Language corpora annotation, distribution, exchange and comparison. In section 2. we discuss the challenge of building inter-operable corpora for corpus-based linguistics. We also examine existing annotation schemes or strategies proposed for SL linguistics. In section 3. we propose a small set of annotation tiers, based on Frame-Semantics, as a common annotation scheme. We also propose to add text-level as well as utterance-level metadata to this common annotation scheme, in order to broaden the range of future uses of SL corpora.
… Project'. Paper presented …, 2006
This paper outlines the establishment of the first digital corpus of Irish Sign Language using a software programme called ELAN. The Signs of Ireland comprises 40 signers making it the largest digital annotated corpus of a signed language in Europe. This paper describes the way in which such software enhances sign linguistic research, and outlines some of the limitations that arise, in great part, because of the lack of a standardized notation system for signed languages, because of the need for human consistency when working on annotation, and the fact that you will 'get out what you put in' when working with a digital corpus: that is, the decisions made regarding the annotations influence analysis results.
Documentary and corpus approaches to sign language research (2015)
2015
In this chapter, we discuss some key aspects of methodology associated with sign language documentation and corpus based approaches to sign language research. We first introduce the field of sign language corpus linguistics, carefully defining the term ‘corpus’ in this context, and discussing the emergence of technology that has made this new approach to sign language research possible. We then discuss specific details of the methodology involved in corpus building, such as the recruitment of participants, the selection of language activities for the corpus, and the set up for filming. We move onto a discussion of annotation for corpora, with a focus on the use of ID glossing. We close with a brief discussion of online archiving and accessibility.
The Sign Linguistics Corpora Network: Towards Standards for Signed Language Resources
Proceedings of LREC 2010, 2010
The Sign Linguistics Corpora Network is a three-year network initiative that aims to collect existing knowledge and practices on the creation and use of signed language resources. The concrete goals are to organise a series of four workshops in 2009 and 2010, create a stable Internet location for such knowledge, and generate new ideas for employing the most recent technologies for the study of signed languages. The network covers a wide range of subjects: data collection, metadata, annotation, and exploitation; these are the topics of the four workshops. The outcomes of the first two workshops are summarised in this paper; both workshops demonstrated that the need for dedicated knowledge on sign language corpora is especially salient in countries where researchers work alone or in small groups, which is still quite common in many places in Europe. While the original goal of the network was primarily to focus on corpus linguistics and language documentation, human language technology has gradually been incorporated as a user group of signed language resources.
Sign Language Corpora for Analysis, Processing and Evaluation
2010
Sign Languages (SLs) are the visuo-gestural languages practised by the deaf communities. Research on SLs requires to build, to analyse and to use corpora. The aim of this paper is to present various kinds of new uses of SL corpora. The way data are used take advantage of the new capabilities of annotation software for visualisation, numerical annotation, and processing. The nature of the data can be video-based or motion capture-based. The aims of the studies include language analysis, animation processing, and evaluation. We describe here some LIMSI's studies, and some studies from other laboratories as examples.