Olivier Ferret - Academia.edu (original) (raw)

Papers by Olivier Ferret

Research paper thumbnail of Deliberate word access: an intuition, a roadmap and some preliminary empirical results

Int J Speech Technol (Abstract No doubt, words play a major role in language production, hence fi... more Int J Speech Technol (Abstract No doubt, words play a major role in language production, hence finding them is of vital importance, be it for writing or for speaking (spontaneous discourse production, simultaneous translation). Words are stored in a dictionary, and the general belief holds, the more entries the better. Yet, to be truly useful the resource should contain not only many entries and a lot of information concerning each one of them, but also adequate navigational means to reveal the stored information. Information access depends crucially on the organization of the data (words) and the access keys (meaning/form), two factors largely overlooked. We will present here some ideas of how an existing electronic dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic dictionary an index based on the notion of association, i.e. words co-occurring in a well balanced corpus, the latter being supposed to represent the average citizen's knowledge of the world. Before describing our approach, we will briefly take a critical look at is, computer-generated language, simulation of the mental lexicon, or WordNet (WN),-to see how adequate they are with regard to our goal.

Research paper thumbnail of QALC–the Question-Answering system of LIMSI-CNRS

… of TREC9, NIST, …, Jan 1, 2000

Research paper thumbnail of Using Temporal Cues for Segmenting Texts into Events

Lecture Notes in Computer Science, 2010

One of the early application of Information Extraction, motivated by the needs for intelligence t... more One of the early application of Information Extraction, motivated by the needs for intelligence tools, is the detection of events in news articles. But this detection may be difficult when news articles mention several occurrences of events of the same kind, which is often done for comparison purposes. We propose in this article new approaches to segment the text of news articles in units relative to only one event, in order to help the identification of relevant information associated with the main event of the news. We present two approaches that use statistical machine learning models (HMM and CRF) exploiting temporal information extracted from the texts as a basis for this segmentation. The evaluation of these approaches in the domain of seismic events show that with a robust and generic approach, we can achieve results at least as good as results obtained with a specialized heuristic approach.

Research paper thumbnail of Segmenter et structurer thématiquement des textes par l'utilisation conjointe de collocations et de la récurrence lexicale

Résumé – Abstract Nous exposons dans cet article une méthode réalisant de façon intégrée deux tâc... more Résumé – Abstract Nous exposons dans cet article une méthode réalisant de façon intégrée deux tâches de l'analyse thématique : la segmentation et la détection de liens thématiques. Cette méthode exploite conjointement la récurrence des mots dans les textes et les liens issus d'un réseau de collocations afin de compenser les faiblesses respectives des deux approches. Nous présentons son évaluation concernant la segmentation sur un corpus en français et un corpus en anglais et nous proposons une mesure d'évaluation spécifiquement adaptée à ce type de systèmes. We present in this paper a method for achieving in an integrated way two tasks of topic analy-sis: segmentation and link detection. This method combines the lexical recurrence in texts and the relations from a collocation network to compensate for the respective weaknesses of the two approaches. We report its evaluation for segmentation on a corpus in French and another in English and we propose an evaluation measure...

Research paper thumbnail of Un système qui s'appuie sur son expérience pour segmenter des textes

Research paper thumbnail of Utiliser des sens de mots pour la segmentation thématique

Research paper thumbnail of Compounds and Distributional Thesauri

Research paper thumbnail of Utiliser un modèle neuronal générique pour la substitution lexicale

Research paper thumbnail of Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

Research paper thumbnail of Sélection non supervisée de relations sémantiques pour améliorer un thésaurus distributionnel

Research paper thumbnail of Intégration de l'indexation conceptuelle dans l'expression du besoin d'information

Research paper thumbnail of Typing Relations in Distributional Thesauri

Text, Speech and Language Technology, 2014

Research paper thumbnail of How to Thematically Segemt Texts by Using Lexical Cohesion?

Meeting of the Association for Computational Linguistics, 1998

This article outlines a quantitative method for segmenting texts into thematically coherent units... more This article outlines a quantitative method for segmenting texts into thematically coherent units. This method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. We also present the results of an experiment about locating boundaries between a series of concatened texts.

Research paper thumbnail of Discovering word senses from a network of lexical cooccurrences

Proceedings of the 20th international conference on Computational Linguistics - COLING '04, 2004

Lexico-semantic networks such as WordNet have been criticized about the nature of the senses they... more Lexico-semantic networks such as WordNet have been criticized about the nature of the senses they distinguish as well as on the way they define these senses. In this article, we present a possible solution to overcome these limits by defining the sense of words from the way they are used. More precisely, we propose to differentiate the senses of a word from a network of lexical cooccurrences built from a large corpus. This method was tested both for French and English and was evaluated for English by comparing its results with WordNet.

Research paper thumbnail of How to thematically segment texts by using lexical cohesion?

Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998

This article outlines a quantitative method for segmenting texts into thematically coherent units... more This article outlines a quantitative method for segmenting texts into thematically coherent units. This method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. We also present the results of an experiment about locating boundaries between a series of concatened texts.

Research paper thumbnail of Using collocations for topic segmentation and link detection

Proceedings of the 19th international conference on Computational linguistics -, 2002

We present in this paper a method for achieving in an integrated way two tasks of topic analysis:... more We present in this paper a method for achieving in an integrated way two tasks of topic analysis: segmentation and link detection. This method combines word repetition and the lexical cohesion stated by a collocation network to compensate for the respective weaknesses of the two approaches. We report an evaluation of our method for segmentation on two corpora, one in French and one in English, and we propose an evaluation measure that specifically suits that kind of systems.

Research paper thumbnail of Découvrir les thèmes d'un document pour en améliorer la segmentation thématique

La segmentation thématique et l'identification des thèmes d'un document sont souvent traitées com... more La segmentation thématique et l'identification des thèmes d'un document sont souvent traitées comme des problèmes séparés, même si elles relèvent toutes deux de l'analyse thématique. Dans cet article, nous proposons d'examiner comment l'identification thématique peut contribuer à améliorer la segmentation de documents lorsque celle-ci ne s'appuie que sur la récurrence lexicale. Nous présentons d'abord une méthode non supervisée de découverte des thèmes d'un document ; puis nous détaillons comment ces thèmes sont utilisés dans la segmentation pour aider à reconnaître les similarités thématiques entre des segments de documents. Nous montrons enfin, au travers d'une évaluation faite à la fois pour le français et pour l'anglais, l'intérêt effectif de la méthode proposée.

Research paper thumbnail of Improving Text Segmentation by Combining Endogenous and Exogenous Methods

Research paper thumbnail of Combining Bootstrapping and Feature Selection for Improving a Distributional Thesaurus

Research paper thumbnail of Bag of senses versus bag of words: comparing semantic and lexical approaches on sentence extraction

... [BFB+07] Florian Boudin, Benoît Favre, Frédéric Béchet, Marc El-Bèze, Laurent Gillard, and Ju... more ... [BFB+07] Florian Boudin, Benoît Favre, Frédéric Béchet, Marc El-Bèze, Laurent Gillard, and Juan-Manuel Torres-Moreno ... In SIGIR'98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336 ...

Research paper thumbnail of Deliberate word access: an intuition, a roadmap and some preliminary empirical results

Int J Speech Technol (Abstract No doubt, words play a major role in language production, hence fi... more Int J Speech Technol (Abstract No doubt, words play a major role in language production, hence finding them is of vital importance, be it for writing or for speaking (spontaneous discourse production, simultaneous translation). Words are stored in a dictionary, and the general belief holds, the more entries the better. Yet, to be truly useful the resource should contain not only many entries and a lot of information concerning each one of them, but also adequate navigational means to reveal the stored information. Information access depends crucially on the organization of the data (words) and the access keys (meaning/form), two factors largely overlooked. We will present here some ideas of how an existing electronic dictionary could be enhanced to support a speaker/writer to find the word s/he is looking for. To this end we suggest to add to an existing electronic dictionary an index based on the notion of association, i.e. words co-occurring in a well balanced corpus, the latter being supposed to represent the average citizen's knowledge of the world. Before describing our approach, we will briefly take a critical look at is, computer-generated language, simulation of the mental lexicon, or WordNet (WN),-to see how adequate they are with regard to our goal.

Research paper thumbnail of QALC–the Question-Answering system of LIMSI-CNRS

… of TREC9, NIST, …, Jan 1, 2000

Research paper thumbnail of Using Temporal Cues for Segmenting Texts into Events

Lecture Notes in Computer Science, 2010

One of the early application of Information Extraction, motivated by the needs for intelligence t... more One of the early application of Information Extraction, motivated by the needs for intelligence tools, is the detection of events in news articles. But this detection may be difficult when news articles mention several occurrences of events of the same kind, which is often done for comparison purposes. We propose in this article new approaches to segment the text of news articles in units relative to only one event, in order to help the identification of relevant information associated with the main event of the news. We present two approaches that use statistical machine learning models (HMM and CRF) exploiting temporal information extracted from the texts as a basis for this segmentation. The evaluation of these approaches in the domain of seismic events show that with a robust and generic approach, we can achieve results at least as good as results obtained with a specialized heuristic approach.

Research paper thumbnail of Segmenter et structurer thématiquement des textes par l'utilisation conjointe de collocations et de la récurrence lexicale

Résumé – Abstract Nous exposons dans cet article une méthode réalisant de façon intégrée deux tâc... more Résumé – Abstract Nous exposons dans cet article une méthode réalisant de façon intégrée deux tâches de l'analyse thématique : la segmentation et la détection de liens thématiques. Cette méthode exploite conjointement la récurrence des mots dans les textes et les liens issus d'un réseau de collocations afin de compenser les faiblesses respectives des deux approches. Nous présentons son évaluation concernant la segmentation sur un corpus en français et un corpus en anglais et nous proposons une mesure d'évaluation spécifiquement adaptée à ce type de systèmes. We present in this paper a method for achieving in an integrated way two tasks of topic analy-sis: segmentation and link detection. This method combines the lexical recurrence in texts and the relations from a collocation network to compensate for the respective weaknesses of the two approaches. We report its evaluation for segmentation on a corpus in French and another in English and we propose an evaluation measure...

Research paper thumbnail of Un système qui s'appuie sur son expérience pour segmenter des textes

Research paper thumbnail of Utiliser des sens de mots pour la segmentation thématique

Research paper thumbnail of Compounds and Distributional Thesauri

Research paper thumbnail of Utiliser un modèle neuronal générique pour la substitution lexicale

Research paper thumbnail of Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

Research paper thumbnail of Sélection non supervisée de relations sémantiques pour améliorer un thésaurus distributionnel

Research paper thumbnail of Intégration de l'indexation conceptuelle dans l'expression du besoin d'information

Research paper thumbnail of Typing Relations in Distributional Thesauri

Text, Speech and Language Technology, 2014

Research paper thumbnail of How to Thematically Segemt Texts by Using Lexical Cohesion?

Meeting of the Association for Computational Linguistics, 1998

This article outlines a quantitative method for segmenting texts into thematically coherent units... more This article outlines a quantitative method for segmenting texts into thematically coherent units. This method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. We also present the results of an experiment about locating boundaries between a series of concatened texts.

Research paper thumbnail of Discovering word senses from a network of lexical cooccurrences

Proceedings of the 20th international conference on Computational Linguistics - COLING '04, 2004

Lexico-semantic networks such as WordNet have been criticized about the nature of the senses they... more Lexico-semantic networks such as WordNet have been criticized about the nature of the senses they distinguish as well as on the way they define these senses. In this article, we present a possible solution to overcome these limits by defining the sense of words from the way they are used. More precisely, we propose to differentiate the senses of a word from a network of lexical cooccurrences built from a large corpus. This method was tested both for French and English and was evaluated for English by comparing its results with WordNet.

Research paper thumbnail of How to thematically segment texts by using lexical cohesion?

Proceedings of the 36th annual meeting on Association for Computational Linguistics -, 1998

This article outlines a quantitative method for segmenting texts into thematically coherent units... more This article outlines a quantitative method for segmenting texts into thematically coherent units. This method relies on a network of lexical collocations to compute the thematic coherence of the different parts of a text from the lexical cohesiveness of their words. We also present the results of an experiment about locating boundaries between a series of concatened texts.

Research paper thumbnail of Using collocations for topic segmentation and link detection

Proceedings of the 19th international conference on Computational linguistics -, 2002

We present in this paper a method for achieving in an integrated way two tasks of topic analysis:... more We present in this paper a method for achieving in an integrated way two tasks of topic analysis: segmentation and link detection. This method combines word repetition and the lexical cohesion stated by a collocation network to compensate for the respective weaknesses of the two approaches. We report an evaluation of our method for segmentation on two corpora, one in French and one in English, and we propose an evaluation measure that specifically suits that kind of systems.

Research paper thumbnail of Découvrir les thèmes d'un document pour en améliorer la segmentation thématique

La segmentation thématique et l'identification des thèmes d'un document sont souvent traitées com... more La segmentation thématique et l'identification des thèmes d'un document sont souvent traitées comme des problèmes séparés, même si elles relèvent toutes deux de l'analyse thématique. Dans cet article, nous proposons d'examiner comment l'identification thématique peut contribuer à améliorer la segmentation de documents lorsque celle-ci ne s'appuie que sur la récurrence lexicale. Nous présentons d'abord une méthode non supervisée de découverte des thèmes d'un document ; puis nous détaillons comment ces thèmes sont utilisés dans la segmentation pour aider à reconnaître les similarités thématiques entre des segments de documents. Nous montrons enfin, au travers d'une évaluation faite à la fois pour le français et pour l'anglais, l'intérêt effectif de la méthode proposée.

Research paper thumbnail of Improving Text Segmentation by Combining Endogenous and Exogenous Methods

Research paper thumbnail of Combining Bootstrapping and Feature Selection for Improving a Distributional Thesaurus

Research paper thumbnail of Bag of senses versus bag of words: comparing semantic and lexical approaches on sentence extraction

... [BFB+07] Florian Boudin, Benoît Favre, Frédéric Béchet, Marc El-Bèze, Laurent Gillard, and Ju... more ... [BFB+07] Florian Boudin, Benoît Favre, Frédéric Béchet, Marc El-Bèze, Laurent Gillard, and Juan-Manuel Torres-Moreno ... In SIGIR'98: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336 ...