Benjamin Piwowarski | Sorbonne University (original) (raw)
Uploads
Papers by Benjamin Piwowarski
Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.
Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.
Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus élect... more Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus électronique les documents les plus pertinents pour des demandes d'information généralement exprimées sous forme de mots-clé par des utilisateurs. Ces systèmes, popularisés par des moteurs de recherche tel que Google ou Yahoo, permettent un accès à une information dont l'unité indivisible la plus petite est le document.
ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user q... more ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user query. This information being more specific than whole documents may therefore reduce the user effort for finding relevant information. However, since XML documents are composed of nested elements, many of which being possibly relevant to the user information need, retrieval systems must take care of the overlap between returned elements. In this paper, we consider this filtering problem.
Abstract Quantum-inspired models have recently attracted increasing attention in Information Retr... more Abstract Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory.
Multidocument summarization (MDS) aims for each given query to extract compressed and relevant in... more Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes.
La représentation des documents et questions en Recherche d'Information (RI) est restée une repré... more La représentation des documents et questions en Recherche d'Information (RI) est restée une représentation majoritairement uni-dimensionnelle (ie, vecteur). Cette représentation a des limites: Comment par exemple représenter un document qui traitent de plusieurs thèmes ou une question ambiguë? Ces problèmes sont importants pour développer des systèmes de RI interactifs ou cherchant à diversifier les résultats.
Abstract: In this paper, we define a new metric family based on two concepts: The definition of t... more Abstract: In this paper, we define a new metric family based on two concepts: The definition of the stopping criterion and the notion of satisfaction, where the former depends on the willingness and expectation of a user exploring search results. Both concepts have been discussed so far in the IR literature, but we argue in this paper that defining a proper single valued metric depends on merging them into a single conceptual framework.
Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is fo... more Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is founded on the mathematical theory underlying Quantum Physics. This framework was developed in the 1930s by von Neumann and Dirac. It was recently further developed and generalised by the so-called “sequential effect algebra” [3].
Résumé: Specificity is a relevance dimension that describes the extent to which a document part f... more Résumé: Specificity is a relevance dimension that describes the extent to which a document part focuses on the topic of request. In the context of semi-structured text (XML) retrieval, a document part corresponds to an XML element. Specificity is defined as the length ratio, typically in number of characters, of contained relevant to irrelevant text in the document part. Different Specificity values can be associated to a document part. These values are drawn from the Specificity relevance scale, which has evolved from a discrete multi- ...
Journées francophones d'Extraction et de Gestion des …, Jan 1, 2003
3rd XML and Information Retrieval …, Jan 1, 2004
International Journal of …, Jan 1, 2009
We propose a method which, given a document to be classified, automatically generates an ordered ... more We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence (the document to be classified). Our model can be used without having preclassified training documents, although it improves its performance as long as more training data become available. We have ...
Abstract Most recent document standards like XML rely on structured representations. On the other... more Abstract Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present here a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learned from the ...
Proceedings of the 34th …, Jan 1, 2011
… in Information Retrieval, Jan 1, 2011
Proceedings of The Sixth Asia …, Jan 1, 2011
Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.
Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.
Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus élect... more Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus électronique les documents les plus pertinents pour des demandes d'information généralement exprimées sous forme de mots-clé par des utilisateurs. Ces systèmes, popularisés par des moteurs de recherche tel que Google ou Yahoo, permettent un accès à une information dont l'unité indivisible la plus petite est le document.
ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user q... more ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user query. This information being more specific than whole documents may therefore reduce the user effort for finding relevant information. However, since XML documents are composed of nested elements, many of which being possibly relevant to the user information need, retrieval systems must take care of the overlap between returned elements. In this paper, we consider this filtering problem.
Abstract Quantum-inspired models have recently attracted increasing attention in Information Retr... more Abstract Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory.
Multidocument summarization (MDS) aims for each given query to extract compressed and relevant in... more Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes.
La représentation des documents et questions en Recherche d'Information (RI) est restée une repré... more La représentation des documents et questions en Recherche d'Information (RI) est restée une représentation majoritairement uni-dimensionnelle (ie, vecteur). Cette représentation a des limites: Comment par exemple représenter un document qui traitent de plusieurs thèmes ou une question ambiguë? Ces problèmes sont importants pour développer des systèmes de RI interactifs ou cherchant à diversifier les résultats.
Abstract: In this paper, we define a new metric family based on two concepts: The definition of t... more Abstract: In this paper, we define a new metric family based on two concepts: The definition of the stopping criterion and the notion of satisfaction, where the former depends on the willingness and expectation of a user exploring search results. Both concepts have been discussed so far in the IR literature, but we argue in this paper that defining a proper single valued metric depends on merging them into a single conceptual framework.
Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is fo... more Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is founded on the mathematical theory underlying Quantum Physics. This framework was developed in the 1930s by von Neumann and Dirac. It was recently further developed and generalised by the so-called “sequential effect algebra” [3].
Résumé: Specificity is a relevance dimension that describes the extent to which a document part f... more Résumé: Specificity is a relevance dimension that describes the extent to which a document part focuses on the topic of request. In the context of semi-structured text (XML) retrieval, a document part corresponds to an XML element. Specificity is defined as the length ratio, typically in number of characters, of contained relevant to irrelevant text in the document part. Different Specificity values can be associated to a document part. These values are drawn from the Specificity relevance scale, which has evolved from a discrete multi- ...
Journées francophones d'Extraction et de Gestion des …, Jan 1, 2003
3rd XML and Information Retrieval …, Jan 1, 2004
International Journal of …, Jan 1, 2009
We propose a method which, given a document to be classified, automatically generates an ordered ... more We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence (the document to be classified). Our model can be used without having preclassified training documents, although it improves its performance as long as more training data become available. We have ...
Abstract Most recent document standards like XML rely on structured representations. On the other... more Abstract Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present here a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learned from the ...
Proceedings of the 34th …, Jan 1, 2011
… in Information Retrieval, Jan 1, 2011
Proceedings of The Sixth Asia …, Jan 1, 2011