Sophie Rosset - Academia.edu (original) (raw)

Papers by Sophie Rosset

For the AMITIÉS multilingual human-computer dialogue project [1], we have developed new methods f... more For the AMITIÉS multilingual human-computer dialogue project [1], we have developed new methods for the manual annotation of spoken dia-logue transcriptions from European financial call centers on multiple levels. We have modified the DAMSL schema [2] to create a dialogue act taxon-omy appropriate to the functions of call center dia-logues. We use a domain-independent framework populated with domain-specific lists to capture the semantics of spoken dialogues. Our new flexible, platform-independent Java annotation tool, called XDMLTool, takes plain-text dialogue files as in-put, and yields annotated files in the widely used XML format. To date, XDMLTool has been used to annotate several hundred call-center dialogues in France, the UK and the US. We present definitions of each tag as well as examples in English and French. These annotation methods are developed for an experimental system that automates finan-cial call centers in Europe. The multi-level annota-tion scheme has been used...

Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing are evaluation methods, metrics and resources reusable? - Evalinitiatives '03, 2003

This paper presents a paradigm for evaluating the context-sensitive under-

Lecture Notes in Computer Science, 2010

... expand. SECTION: AdHoc-persian. Evaluation of perstem: a simple and efficient stemming algori... more ... expand. SECTION: AdHoc-persian. Evaluation of perstem: a simple and efficient stemming algorithm for Persian. Amir Hossein Jadidinejad, Fariborz Mahmoudi, Jon Dehdari. Pages: 98-101. Persian is a challenging language in the field of NLP. ...

Lecture Notes in Computer Science, 2008

In this paper, we present the LIMSI question-answering systems on speech transcripts which partic... more In this paper, we present the LIMSI question-answering systems on speech transcripts which participated to the QAst 2008 evaluation. These systems are based on a complete and multilevel analysis of both queries and documents. These systems use an automatically generated research descriptor. A score based on those descriptors is used to select documents and snippets. The extraction and scoring of candidate answers is based on proximity measurements within the research descriptor elements and a number of secondary factors. We participated to all the subtasks and submitted 18 runs (for 16 sub-tasks). The evaluation results for manual transcripts range from 31% to 45% for accuracy depending on the task and from 16 to 41% for automatic transcripts.

Lecture Notes in Computer Science, 2009

In this paper, we present the LIMSI question-answering system which participated to the Question ... more In this paper, we present the LIMSI question-answering system which participated to the Question Answering on speech transcripts 2008 evaluation. This systems is based on a complete and multi-level analysis of both queries and documents. It uses an automatically ...

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), 2007

In this paper, we present two different question-answering systems on speech transcripts. These t... more In this paper, we present two different question-answering systems on speech transcripts. These two systems are based on a complete and multi-level analysis of both queries and documents. The first system uses handcrafted rules for small text fragments (snippet) selection and answer extraction. The second one replaces the handcrafting with an automatically generated research descriptor. A score based on those descriptors is used to select documents and snippets. The extraction and scoring of candidate answers is based on proximity measurements within the research descriptor elements and a number of secondary factors. The preliminary results obtained on QAst (QA on speech transcripts) development data are promising ranged from 72% correct answer at 1st rank on manually transcribed meeting data to 94% on manually transcribed lecture data.

Lecture Notes in Computer Science, 2008

This paper describes QAST, a pilot track of CLEF 2007 aimed at evaluating the task of Question An... more This paper describes QAST, a pilot track of CLEF 2007 aimed at evaluating the task of Question Answering in Speech Transcripts. The paper summarizes the evaluation framework, the systems that participated and the results achieved. These results have shown that question answering technology can be useful to deal with spontaneous speech transcripts, so for manually transcribed speech as for automatically recognized speech. The loss in accuracy from dealing with manual transcripts to dealing with automatic ones implies that there is room for future reseach in this area.

The aim of the Multimodal-Multimedia Automated Service Kiosk (MASK) project is topave the way for... more The aim of the Multimodal-Multimedia Automated Service Kiosk (MASK) project is topave the way for more advanced public service applications by user interfaces employingmultimodal, multi-media input and output. The project has analyzed the technological requirementsin the context of users and the tasks they perform in carrying out travel enquiries,and developed a prototype information kiosk that will be installed in the

Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications, 1996

This paper reports on the RAILTEL field trial carried out by LIMSI, to assess the technical adequ... more This paper reports on the RAILTEL field trial carried out by LIMSI, to assess the technical adequacy of available speech technology for interactive vocal access to static train timetable information. The data collection system used to carry out the field trials, is based on the LIMSI MASK spoken language system and runs on a Unix workstation with a high quality

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996

Dialog management is of particular importance in telephone-based services. In this paper we descr... more Dialog management is of particular importance in telephone-based services. In this paper we describe our recent activities in dialog management and natural language generation in the LIMSI RAIL-TEL system for access to rail travel information. The aim of LE-MLAP project RAILTEL was to assess the capabilities of spoken language technology for interactive telephone information services. Because all interaction is over the telephone, oral dialog management and response generation are very important aspects of the overall system design and usability. Each dialog is analysed to determine the source of any errors (speech recognition, understanding, information retreival, processing, or dialog management). An analysis is provided for 100 dialogs taken from the RAILTEL field trials with naive subjects accessing timetable information.

This paper describes the experience of QAST 2008, the second time a pilot track of CLEF has been ... more This paper describes the experience of QAST 2008, the second time a pilot track of CLEF has been held aiming to evaluate the task of Question Answering in Speech Transcripts. Five sites submitted results for at least one of the five scenarios (lectures in English, meetings in English, broadcast news in French and European Parliament debates in English and Spanish). In order to assess the impact of potential errors of automatic speech recognition, for each task contrastive conditions are with manual and automatically produced transcripts. The QAST 2008 evaluation framework is described, along with descriptions of the five scenarios and their associated data, the system submissions for this pilot track and the official evaluation results.

Cet article présente l'étude d'un corpus de réponses formulées par des humains à des questions fa... more Cet article présente l'étude d'un corpus de réponses formulées par des humains à des questions factuelles. Des observations qualitatives et quantitatives sur la reprise d'éléments de la question dans les réponses sont exposées. La notion d'information-réponse est introduite et une étude de la présence de cet élément dans le corpus est proposée. Enfin, les formulations des réponses sont étudiées.

This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus... more This paper deals with the contextual analysis of the vocalic hesitation euh in French in a corpus of human elicited answers. Through the analysis of the contextual combinatorial patterns, the new information introductory role of this vocalic hesitation is investigated. Observations supports trends noticed in other languages and suggest potential optimization for question answering automatic systems.

Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing are evaluation methods, metrics and resources reusable? - Evalinitiatives '03, 2003

This paper presents a paradigm for evaluating the context-sensitive under-

Lecture Notes in Computer Science, 2010

Lecture Notes in Computer Science, 2008

Lecture Notes in Computer Science, 2009

2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), 2007

Lecture Notes in Computer Science, 2008

Proceedings of IVTTA '96. Workshop on Interactive Voice Technology for Telecommunications Applications, 1996

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96, 1996