Evaluation of natural language processing systems: Issues and approaches (original) (raw)

2000, Proceedings of the IEEE

This paper encompasses t w o main topics: a broad and general analysis of the issue of performance evaluation of NLP systems and a report on a specific approach developed by the authors and experimented on a sample test case. More precisely, it first presents a brief survey of the major works in the area of NIP systems evaluation. Then, after introducing the notion of the life cycle of an N I P system, it focuses on the concept of performance evaluation and analyzes the scope and the major problems of the investigation. The tools generally used within computer science to assess the quality of a software system are briefly reviewed, and their applicability to the task of evaluation of NLP systems is discussed. Particular attention is devoted to the concepts of efficiency, correctness, reliability, and adequacy, and how all of them basically fail in capturing the peculiar features of performance evaluation of an N I P system is discussed. Two main approaches to performance evaluation are later introduced; namely, black-box-and modelbased, and their most important characteristics are presented. Finally, a specific model for performance evaluation proposed by the authors is illustrated, and the results of an experiment with a sample application are reported. The paper concludes with a discussion o n research perspectwes, open problems, and importance ofperformance evaluation to industrial applications.

Sign up for access to the world's latest research.

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Workshop on the evaluation of natural language processing systems

1990

In the past few years, the computational linguistics research community has begun to wrestle with the problem of how to evaluate its progress in developing natural language processing systems. With the exception of natural language interfaces, there are few working systems in existence, and they tend to focus on very different tasks using equally different techniques.

On Evaluation of Natural Language Processing Tasks - Is Gold Standard Evaluation Methodology a Good Solution?

Proceedings of the 8th International Conference on Agents and Artificial Intelligence, 2016

The paper discusses problems in state of the art evaluation methods used in natural language processing (NLP). Usually, some form of gold standard data is used for evaluation of various NLP tasks, ranging from morphological annotation to semantic analysis. We discuss problems and validity of this type of evaluation, for various tasks, and illustrate the problems on examples. Then we propose using application-driven evaluations, wherever it is possible. Although it is more expensive, more complicated and not so precise, it is the only way to find out if a particular tool is useful at all.

Evaluating natural language systems

Proceedings of the 12th conference on Computational linguistics -, 1988

This paper reports progress in development of evaluation methodologies for natural language systems. Without a common classification of the problems in natural language understanding authors have no way to specify clearly what their systems do, potential users have no way to compare different systems and researchers have no way to judge the advantages or disadvantages of different approaches to developing systems. introduction.

Principles of Evaluation in Natural Language Processing

In this special issue of TAL, we look at the fundamental principles underlying evaluation in natural language processing. We adopt a global point of view that goes beyond the horizon of a single evaluation campaign or a particular protocol. After a brief review of history and terminology, we will address the topic of a gold standard for natural language processing, of annotation quality, of the amount of data, of the difference between technology evaluation and usage evaluation, of dialog systems, and of standards, before concluding with a short discussion of the articles in this special issue and some prospective remarks. RÉSUMÉ. Dans ce numéro spécial de TAL nous nous intéressons aux principes fondamentaux qui sous-tendent l'évaluation pour le traitement automatique du langage naturel, que nous abordons de manière globale, c'est à dire au delà de l'horizon d'une seule campagne d'évaluation ou d'un protocole particulier. Après un rappel historique et terminologique, nous aborderons le sujet de la référence pour le traitement du langage naturel, de la qualité des annotations, de la quantité des données, des différence entre évaluation de technologie et évaluation d'usage, de l'évaluation des systèmes de dialogue, des standards avant de conclure sur une bref présentation des articles du numéro et quelques remarques prospectives.

Reducing Subjectivity of Natural Language Processing System Evaluation

In this article we investigate problems of the current means of evaluation of natural language systems. We find that apart from the practical problems, there is a more fundamental problem: the evaluation standards we measure against may not be objectively defined. In a sense, the very evaluation problems we set ourselves may not be well posed. We speculate on reasons for this, on ways to contain it, on evaluation standards which may more accurately reflect the underlying nature of language, and indeed on the appropriateness of a narrow focus one valuation alone at our current stage of understanding the language process.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.