Bogdan Sacaleanu - Academia.edu (original) (raw)
Uploads
Papers by Bogdan Sacaleanu
Ldv Forum, 2004
Most approaches in sense disambiguation have been restricted to supervised training over manually... more Most approaches in sense disambiguation have been restricted to supervised training over manually annotated, non-technical, English corpora. Application to a new language or technical domain requires extensive manual annotation of appropriate training corpora. As this is both expensive and inefficient, unsupervised methods are to be preferred, specifically in technical domains such as medicine. In the context of a project in
International Journal of Medical Informatics, 2002
... In: Angelika Storrer, Alexander Geyken, Alexander, Alexander Siebert and Kay-Michael Würzner ... more ... In: Angelika Storrer, Alexander Geyken, Alexander, Alexander Siebert and Kay-Michael Würzner (eds.): Text Resources and Lexical Knowledge Selected Papers from the 9th Conference on Natural Language Processing KONVENS 2008, Mouton de Gruyter, Berlin, New York ...
The general aim of the third CLEF Multilingual Question Answering Track was to set up a common an... more The general aim of the third CLEF Multilingual Question Answering Track was to set up a common and replicable evaluation framework to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages. Nine target languages and ten source languages were exploited to enact 8 monolingual and 73 cross-language tasks. Twenty-four groups participated in the exercise. Overall results showed a general increase in performance in comparison to last year. The best performing monolingual system irrespective of target language answered 64.5% of the questions correctly (in the monolingual Portuguese task), while the average of the best performances for each target language was 42.6%. The cross-language step instead entailed a considerable drop in performance. In addition to accuracy, the organisers also measured the relation between the correctness of an answer and a system’s stated confidence in it, showing that the best systems did not always provide the most reliable confidence score. We provide an overview of the 2005 QA track, detail the procedure followed to build the test sets and present a general analysis of the results.
We describe the extensions made to our 2004 QA@CLEF German/English QA-system, toward a fully Germ... more We describe the extensions made to our 2004 QA@CLEF German/English QA-system, toward a fully German-English/English-German cross-language system with answer validation through web usage. Details concerning the processing of factoid, definition and temporal questions are given and the results obtained in the monolingual German, bilingual English-German and German-English tasks are briefly presented and discussed.
... Computational Intelligence 18 (2002) 451476 4. Brants, T.: Tnt a statistical part-of-speec... more ... Computational Intelligence 18 (2002) 451476 4. Brants, T.: Tnt a statistical part-of-speech tagger. ... Page 13. A CrossLanguage Question/AnsweringSystem for German and English 571 7. Collins, M., Singer, Y.: Unsupervised models for named entity classification. ...
The paper describes QUANTICO, a cross-language open domain question answering system for German a... more The paper describes QUANTICO, a cross-language open domain question answering system for German and English. The main features of the system are: use of preemptive off-line document annotation with syntactic information like chunk structures, apposition constructions and abbreviation-extension pairs for the passage retrieval; use of online translation services, language models and alignment methods for the cross-language scenarios; use of redundancy as an indicator of good answer candidates; selection of the best answers based on distance metrics defined over graph representations. Based on the question type two different strategies of answer extraction are triggered: for factoid questions answers are extracted from best IR-matched passages and selected by their redundancy and distance to the question keywords; for definition questions answers are considered to be the most redundant normalized linguistic structures with explanatory role (i.e., appositions, abbreviation’s extensions). The results of evaluating the system’s performance by CLEF were as follows: for the best German-German run we achieved an overall accuracy (ACC) of 42.33% and a mean reciprocal rank (MRR) of 0.45; for the best English-German run 32.98% (ACC) and 0.35 (MRR); for the German-English run 17.89% (ACC) and 0.17 (MRR).
This report describes the work done by the QA group of the Language Technology Lab at DFKI, for t... more This report describes the work done by the QA group of the Language Technology Lab at DFKI, for the 2004 edition of the Cross-Language Evaluation Forum (CLEF). Based on the experience we obtained through our participation at QA@Clef-2003 with our initial cross-lingual QA prototype system BiQue (cf. [1]), the focus of the system extension for this year’s task was a) on robust NL question interpretation using advanced linguistic-based components, b) flexible interface strategies to IR-search engines, and c) on strategies for off-line annotation of the data collection, which support query-specific indexing and answer selection. The overall architecture of the extended system, as well as the results obtained in the CLEF–2004 Monolingual German and Bilingual German/English QA tracks will be presented and discussed throughout the paper.
Ldv Forum, 2004
Most approaches in sense disambiguation have been restricted to supervised training over manually... more Most approaches in sense disambiguation have been restricted to supervised training over manually annotated, non-technical, English corpora. Application to a new language or technical domain requires extensive manual annotation of appropriate training corpora. As this is both expensive and inefficient, unsupervised methods are to be preferred, specifically in technical domains such as medicine. In the context of a project in
International Journal of Medical Informatics, 2002
... In: Angelika Storrer, Alexander Geyken, Alexander, Alexander Siebert and Kay-Michael Würzner ... more ... In: Angelika Storrer, Alexander Geyken, Alexander, Alexander Siebert and Kay-Michael Würzner (eds.): Text Resources and Lexical Knowledge Selected Papers from the 9th Conference on Natural Language Processing KONVENS 2008, Mouton de Gruyter, Berlin, New York ...
The general aim of the third CLEF Multilingual Question Answering Track was to set up a common an... more The general aim of the third CLEF Multilingual Question Answering Track was to set up a common and replicable evaluation framework to test both monolingual and cross-language Question Answering (QA) systems that process queries and documents in several European languages. Nine target languages and ten source languages were exploited to enact 8 monolingual and 73 cross-language tasks. Twenty-four groups participated in the exercise. Overall results showed a general increase in performance in comparison to last year. The best performing monolingual system irrespective of target language answered 64.5% of the questions correctly (in the monolingual Portuguese task), while the average of the best performances for each target language was 42.6%. The cross-language step instead entailed a considerable drop in performance. In addition to accuracy, the organisers also measured the relation between the correctness of an answer and a system’s stated confidence in it, showing that the best systems did not always provide the most reliable confidence score. We provide an overview of the 2005 QA track, detail the procedure followed to build the test sets and present a general analysis of the results.
We describe the extensions made to our 2004 QA@CLEF German/English QA-system, toward a fully Germ... more We describe the extensions made to our 2004 QA@CLEF German/English QA-system, toward a fully German-English/English-German cross-language system with answer validation through web usage. Details concerning the processing of factoid, definition and temporal questions are given and the results obtained in the monolingual German, bilingual English-German and German-English tasks are briefly presented and discussed.
... Computational Intelligence 18 (2002) 451476 4. Brants, T.: Tnt a statistical part-of-speec... more ... Computational Intelligence 18 (2002) 451476 4. Brants, T.: Tnt a statistical part-of-speech tagger. ... Page 13. A CrossLanguage Question/AnsweringSystem for German and English 571 7. Collins, M., Singer, Y.: Unsupervised models for named entity classification. ...
The paper describes QUANTICO, a cross-language open domain question answering system for German a... more The paper describes QUANTICO, a cross-language open domain question answering system for German and English. The main features of the system are: use of preemptive off-line document annotation with syntactic information like chunk structures, apposition constructions and abbreviation-extension pairs for the passage retrieval; use of online translation services, language models and alignment methods for the cross-language scenarios; use of redundancy as an indicator of good answer candidates; selection of the best answers based on distance metrics defined over graph representations. Based on the question type two different strategies of answer extraction are triggered: for factoid questions answers are extracted from best IR-matched passages and selected by their redundancy and distance to the question keywords; for definition questions answers are considered to be the most redundant normalized linguistic structures with explanatory role (i.e., appositions, abbreviation’s extensions). The results of evaluating the system’s performance by CLEF were as follows: for the best German-German run we achieved an overall accuracy (ACC) of 42.33% and a mean reciprocal rank (MRR) of 0.45; for the best English-German run 32.98% (ACC) and 0.35 (MRR); for the German-English run 17.89% (ACC) and 0.17 (MRR).
This report describes the work done by the QA group of the Language Technology Lab at DFKI, for t... more This report describes the work done by the QA group of the Language Technology Lab at DFKI, for the 2004 edition of the Cross-Language Evaluation Forum (CLEF). Based on the experience we obtained through our participation at QA@Clef-2003 with our initial cross-lingual QA prototype system BiQue (cf. [1]), the focus of the system extension for this year’s task was a) on robust NL question interpretation using advanced linguistic-based components, b) flexible interface strategies to IR-search engines, and c) on strategies for off-line annotation of the data collection, which support query-specific indexing and answer selection. The overall architecture of the extended system, as well as the results obtained in the CLEF–2004 Monolingual German and Bilingual German/English QA tracks will be presented and discussed throughout the paper.