ニホンゴ ガクシュウシャ ノ サクブン コーパス ノ ゲンゴ キョウイク ケンキュウ ノ タメ ノ ゴヨウ タグ アノテーション ノ ジドウカ (original) (raw)

An Environment for Learner Corpus Research and Error Analysis

International Journal of Computer-Assisted Language Learning and Teaching, 2014

This article presents an environment developed for Learner Corpus Research and Error Analysis which makes it possible to deal with language errors from different points of view and with several aims. In the field of Intelligent Computer Assisted Language Learning (ICALL), our objective is to gain a better understanding of the language learning process. In the field of Natural Language Processing (NLP), we work on the development of applications that will help both language learners and teachers in their learning/teaching processes. Using this environment, several studies and experiments on error analysis have been carried out, and thanks to an in-depth study on determiner-related errors in Basque, some contributions in the above mentioned fields of research have been made.

ERROR TAGGING SYSTEMS FOR LEARNER CORPORA ANA DÍAZ-NEGRILLO JESÚS FERNÁNDEZ-DOMÍNGUEZ

Learner corpora are used to investigate computerised learner language so as to gain insights into foreign language learning. One of the methodologies that can be applied to this type of research is computer-aided error analysis (CEA), which, in general terms, consists in the study of learner errors as contained in a learner corpus. Surveys of current learner corpora and of issues of learner corpus research have been published in the last few, where information on CEA research can be found, although usually limited. This article is centred on CEA research and is intended as a review of error tagging systems, including error categorizations, dimensions and levels of description. KEYWORDS. Second language acquisition, learner corpus research, computer-aided error analysis. RESUMEN. Los corpus de estudiantes se utilizan para la investigación de la lengua de estudiantes en formato electrónico con el fin de arrojar luz al proceso de adquisición de lenguas extranjeras. Una de las metodologías que se utilizan en este campo es el análisis informatizado de errores que, en términos generales, consiste en estudiar los errores recogidos en un corpus de estudiantes. Revisiones de los corpus de estudiantes existentes y de cuestiones relacionadas con el campo de la investigación en corpus de estudiantes han sido publicadas en los últimos años

Using the web as a linguistic resource to automatically correct lexico-syntactic errors

The 6th Edition of …, 2008

This paper presents an algorithm for correcting language errors typical of second-language learners. We focus on preposition errors, which are very common among second-language learners but are not addressed well by current commercial grammar correctors and editing aids. The algorithm takes as input a sentence containing a preposition error (and possibly other errors as well), and outputs the correct preposition for that particular sentence context. We use a two-phase hybrid rule-based and statistical approach. In the first phase, rulebased processing is used to generate a short expression that captures the context of use of the preposition in the input sentence. In the second phase, Web searches are used to evaluate the frequency of this expression, when alternative prepositions are used instead of the original one. We tested this algorithm on a corpus of 133 French sentences written by intermediate second-language learners, and found that it could address 69.9% of those cases. In contrast, we found that the best French grammar and spell checker currently on the market, Antidote, addressed only 3% of those cases. We also showed that performance degrades gracefully when using a corpus of frequent ngrams to evaluate frequencies.

Classification and Generation of Grammatical Errors

Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering - C3S2E '14, 2008

The grammatical structure of natural language shapes and defines nearly every mode of communication, especially in the digital and written form; the misuse of grammar is a common and natural nuisance, and a strategy for automatically detecting mistakes in grammatical syntax presents a challenge worth solving. This thesis research seeks to address the challenge, and in doing so, defines and implements a unique approach that combines machine-learning and statistical natural language processing techniques. Several important methods are established by this research: (1) the automated and systematic generation of grammatical errors and parallel error corpora; (2) the definition and extraction of over 150 features of a sentence; and (3) the application of various machine-learning classification algorithms on extracted feature data, in order to classify and predict the grammaticality of a sentence. v I express my greatest gratitude to my supervisor, Dr. Eric Harley, for introducing and piquing my interest in the topic; I am humbled and grateful for his enduring assistance, tireless patience, and thoughtful encouragement. He has provided advice and direction, especially where I have encountered pause or hesitation, and has inspired new ideas and avenues for exploration within this research. I am thankful for his endless support. I also extend thanks to the members of my thesis dissertation committee, Dr. Alex Ferworn, Dr. Cherie Ding, and Dr. Isaac Woungang, for their time and effort in reviewing my work. Their valuable feedback and insights have served to improve the relevancy and composition of this thesis, as well as my academic mettle. Lastly, I wish to convey my appreciation to the Department of Computer Science at Ryerson University, the faculty and staff, who have instructed and encouraged me to pursue my academic goals along the way.

An Error Analysis of English

ABSRTAK Jiko, M, Humaidi. 2015. Kesalahan-kesalahan Siswa dalam membentuk kalimat tanya dan kalimat negatif dari present tense di SMA Negeri 8 Kota Ternate. Skripsi, Program Studi Pendidikan Bahasa Inggris FKIP Universitas Khairun. Pembimbing: (I) Asrul M. Syawal S.Pd.,M.Pd (II) Naniek Jusnita, S.Pd., M.Pd

Corpus use by student writers: error correction by Thai learners of English

2018

Researchers in corpus linguistics and applied linguistics have recommended the use of corpus data by language learners to promote independent learning (Bernardini, 2004; Yoon & Hirvela, 2004; O’Keeffe et al, 2007). However, it is not clear to what extent learners are able to use corpus resources independently, and how they can be trained to use a corpus more effectively. This thesis reports a study of learners using a corpus for error correction. The learners recorded their processes using a think-aloud protocol. The thesis records three main findings. Firstly, the learners found it easiest to spot and correct errors of clause structure, noun class, adjective pattern, and collocation; they found verb pattern the most difficult errors to correct. Secondly, the learners most frequently searched for information about colligation, collocation, acceptability/occurrence of strings in a corpus, and determiner-noun agreement; they searched for information about lexical pattern relatively in...

Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness

ReCALL, 2007

Learner corpora, electronic collections of spoken or written data from foreign language learners, offer unparalleled access to many hitherto uncovered aspects of learner language, particularly in their error-tagged format. This article aims to demonstrate the role that the learner corpus can play in CALL, particularly when used in conjunction with web-based interfaces which provide flexible access to errortagged corpora that have been enhanced with simple NLP techniques such as POStagging or lemmatization and linked to a wide range of learner and task variables such as mother tongue background or activity type. This new resource is of interest to three main types of users: teachers wishing to prepare pedagogical materials that target learners' attested difficulties; learners themselves for editing or language awareness purposes and NLP researchers, for whom it serves as a benchmark for testing automatic error detection systems.