Alexis Mitchell | Strayer University (original) (raw)
Address: Morrow, Georgia, United States
less
Uploads
Papers by Alexis Mitchell
Progress in natural language processing requires increasing amounts of data and annotation in a g... more Progress in natural language processing requires increasing amounts of data and annotation in a growing variety of languages, and research in named entity extraction is no exception. While the value of richlyannotated, large-scale multilingual corpora is undeniable, costs for producing such data are high, underscoring the value of shared resources. As part of the US Governmentsponsored Automatic Content Extraction Program (ACE), the University of Pennsylvania's Linguistic Data Consortium has recently created a number of shared resources to support technology evaluations in multilingual information extraction. This paper discusses the challenges of multilingual corpus development, with a particular focus on Chinese named entities. It concludes with a description of the corpora developed to support this research.
Progress in natural language processing requires increasing amounts of data and annotation in a g... more Progress in natural language processing requires increasing amounts of data and annotation in a growing variety of languages, and research in named entity extraction is no exception. While the value of richlyannotated, large-scale multilingual corpora is undeniable, costs for producing such data are high, underscoring the value of shared resources. As part of the US Governmentsponsored Automatic Content Extraction Program (ACE), the University of Pennsylvania's Linguistic Data Consortium has recently created a number of shared resources to support technology evaluations in multilingual information extraction. This paper discusses the challenges of multilingual corpus development, with a particular focus on Chinese named entities. It concludes with a description of the corpora developed to support this research.