Benjamin Piwowarski | Sorbonne University (original) (raw)

Uploads

Papers by Benjamin Piwowarski

Research paper thumbnail of Precision recall with user modeling (PRUM): Application to structured information retrieval

Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.

Research paper thumbnail of An extension of precision-recall with user modelling (PRUM): Application to XML retrieval. Transactions on Information Systems

Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.

Research paper thumbnail of Rapport outilex

Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus élect... more Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus électronique les documents les plus pertinents pour des demandes d'information généralement exprimées sous forme de mots-clé par des utilisateurs. Ces systèmes, popularisés par des moteurs de recherche tel que Google ou Yahoo, permettent un accès à une information dont l'unité indivisible la plus petite est le document.

Research paper thumbnail of Ranking document fragments in XML Retrieval: an empirical study

ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user q... more ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user query. This information being more specific than whole documents may therefore reduce the user effort for finding relevant information. However, since XML documents are composed of nested elements, many of which being possibly relevant to the user information need, retrieval systems must take care of the overlap between returned elements. In this paper, we consider this filtering problem.

Research paper thumbnail of On the use of complex numbers in quantum models for information retrieval

Abstract Quantum-inspired models have recently attracted increasing attention in Information Retr... more Abstract Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory.

Research paper thumbnail of How quantum theory is developing the field of Information Retrieval

Research paper thumbnail of On using a Quantum Physics formalism for Multi-document Summarisation

Multidocument summarization (MDS) aims for each given query to extract compressed and relevant in... more Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes.

Research paper thumbnail of Méthodologie pour une représentation multi-dimensionnelle des documents

La représentation des documents et questions en Recherche d'Information (RI) est restée une repré... more La représentation des documents et questions en Recherche d'Information (RI) est restée une représentation majoritairement uni-dimensionnelle (ie, vecteur). Cette représentation a des limites: Comment par exemple représenter un document qui traitent de plusieurs thèmes ou une question ambiguë? Ces problèmes sont importants pour développer des systèmes de RI interactifs ou cherchant à diversifier les résultats.

Research paper thumbnail of Beyond Cumulated Gain and Average Precision: Including Willingness and Expectation in the User Model

Abstract: In this paper, we define a new metric family based on two concepts: The definition of t... more Abstract: In this paper, we define a new metric family based on two concepts: The definition of the stopping criterion and the notion of satisfaction, where the former depends on the willingness and expectation of a user exploring search results. Both concepts have been discussed so far in the IR literature, but we argue in this paper that defining a proper single valued metric depends on merging them into a single conceptual framework.

Research paper thumbnail of The Kernel Quantum Probabilities (KQP) Library

Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is fo... more Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is founded on the mathematical theory underlying Quantum Physics. This framework was developed in the 1930s by von Neumann and Dirac. It was recently further developed and generalised by the so-called “sequential effect algebra” [3].

Research paper thumbnail of Specificity

Résumé: Specificity is a relevance dimension that describes the extent to which a document part f... more Résumé: Specificity is a relevance dimension that describes the extent to which a document part focuses on the topic of request. In the context of semi-structured text (XML) retrieval, a document part corresponds to an XML element. Specificity is defined as the length ratio, typically in number of characters, of contained relevant to irrelevant text in the document part. Different Specificity values can be associated to a document part. These values are drawn from the Specificity relevance scale, which has evolved from a discrete multi- ...

Research paper thumbnail of Structure, recherche d'information et apprentissage

Journées francophones d'Extraction et de Gestion des …, Jan 1, 2003

Research paper thumbnail of Filtering in XML Retrieval: a Prospective Analysis

3rd XML and Information Retrieval …, Jan 1, 2004

Research paper thumbnail of Editorial: Introduction to the special issue on Graphical Models and Information Retrieval

International Journal of …, Jan 1, 2009

We propose a method which, given a document to be classified, automatically generates an ordered ... more We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence (the document to be classified). Our model can be used without having preclassified training documents, although it improves its performance as long as more training data become available. We have ...

Research paper thumbnail of Evaluation Metrics

Research paper thumbnail of A Stochastic Model for XML Information Retrieval: Searching and Learning with the INEX collection

Abstract Most recent document standards like XML rely on structured representations. On the other... more Abstract Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present here a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learned from the ...

Research paper thumbnail of Towards a science of user engagement (Position Paper)

Research paper thumbnail of Handling data sparsity in collaborative filtering using emotion and semantic based features

Proceedings of the 34th …, Jan 1, 2011

Research paper thumbnail of Processing queries in session in a quantum-inspired IR framework

… in Information Retrieval, Jan 1, 2011

Research paper thumbnail of A Query Algebra for Quantum Information Retrieval

Proceedings of The Sixth Asia …, Jan 1, 2011

Research paper thumbnail of Precision recall with user modeling (PRUM): Application to structured information retrieval

Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.

Research paper thumbnail of An extension of precision-recall with user modelling (PRUM): Application to XML retrieval. Transactions on Information Systems

Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like X... more Abstract Standard Information Retrieval (IR) metrics are not well suited for new paradigms like XML or Web IR in which retrievable information units are document elements and/or sets of related documents. Part of the problem stems from the classical hypotheses on the user models: They do not take into account the structural or logical context of document elements or the possibility of navigation between units. This article proposes an explicit and formal user model that encompasses a large variety of user behaviors.

Research paper thumbnail of Rapport outilex

Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus élect... more Les systèmes de Recherche d'Information (RI), permettent de rechercher dans de grand corpus électronique les documents les plus pertinents pour des demandes d'information généralement exprimées sous forme de mots-clé par des utilisateurs. Ces systèmes, popularisés par des moteurs de recherche tel que Google ou Yahoo, permettent un accès à une information dont l'unité indivisible la plus petite est le document.

Research paper thumbnail of Ranking document fragments in XML Retrieval: an empirical study

ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user q... more ABSTRACT In the XML retrieval paradigm, document fragments may be returned as answers to a user query. This information being more specific than whole documents may therefore reduce the user effort for finding relevant information. However, since XML documents are composed of nested elements, many of which being possibly relevant to the user information need, retrieval systems must take care of the overlap between returned elements. In this paper, we consider this filtering problem.

Research paper thumbnail of On the use of complex numbers in quantum models for information retrieval

Abstract Quantum-inspired models have recently attracted increasing attention in Information Retr... more Abstract Quantum-inspired models have recently attracted increasing attention in Information Retrieval. An intriguing characteristic of the mathematical framework of quantum theory is the presence of complex numbers. However, it is unclear what such numbers could or would actually represent or mean in Information Retrieval. The goal of this paper is to discuss the role of complex numbers within the context of Information Retrieval. First, we introduce how complex numbers are used in quantum probability theory.

Research paper thumbnail of How quantum theory is developing the field of Information Retrieval

Research paper thumbnail of On using a Quantum Physics formalism for Multi-document Summarisation

Multidocument summarization (MDS) aims for each given query to extract compressed and relevant in... more Multidocument summarization (MDS) aims for each given query to extract compressed and relevant information with respect to the different query-related themes present in a set of documents. Many approaches operate in two steps. Themes are first identified from the set, and then a summary is formed by extracting salient sentences within the different documents of each of the identified themes. Among these approaches, latent semantic analysis (LSA) based approaches rely on spectral decomposition techniques to identify the themes.

Research paper thumbnail of Méthodologie pour une représentation multi-dimensionnelle des documents

La représentation des documents et questions en Recherche d'Information (RI) est restée une repré... more La représentation des documents et questions en Recherche d'Information (RI) est restée une représentation majoritairement uni-dimensionnelle (ie, vecteur). Cette représentation a des limites: Comment par exemple représenter un document qui traitent de plusieurs thèmes ou une question ambiguë? Ces problèmes sont importants pour développer des systèmes de RI interactifs ou cherchant à diversifier les résultats.

Research paper thumbnail of Beyond Cumulated Gain and Average Precision: Including Willingness and Expectation in the User Model

Abstract: In this paper, we define a new metric family based on two concepts: The definition of t... more Abstract: In this paper, we define a new metric family based on two concepts: The definition of the stopping criterion and the notion of satisfaction, where the former depends on the willingness and expectation of a user exploring search results. Both concepts have been discussed so far in the IR literature, but we argue in this paper that defining a proper single valued metric depends on merging them into a single conceptual framework.

Research paper thumbnail of The Kernel Quantum Probabilities (KQP) Library

Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is fo... more Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is founded on the mathematical theory underlying Quantum Physics. This framework was developed in the 1930s by von Neumann and Dirac. It was recently further developed and generalised by the so-called “sequential effect algebra” [3].

Research paper thumbnail of Specificity

Résumé: Specificity is a relevance dimension that describes the extent to which a document part f... more Résumé: Specificity is a relevance dimension that describes the extent to which a document part focuses on the topic of request. In the context of semi-structured text (XML) retrieval, a document part corresponds to an XML element. Specificity is defined as the length ratio, typically in number of characters, of contained relevant to irrelevant text in the document part. Different Specificity values can be associated to a document part. These values are drawn from the Specificity relevance scale, which has evolved from a discrete multi- ...

Research paper thumbnail of Structure, recherche d'information et apprentissage

Journées francophones d'Extraction et de Gestion des …, Jan 1, 2003

Research paper thumbnail of Filtering in XML Retrieval: a Prospective Analysis

3rd XML and Information Retrieval …, Jan 1, 2004

Research paper thumbnail of Editorial: Introduction to the special issue on Graphical Models and Information Retrieval

International Journal of …, Jan 1, 2009

We propose a method which, given a document to be classified, automatically generates an ordered ... more We propose a method which, given a document to be classified, automatically generates an ordered set of appropriate descriptors extracted from a thesaurus. The method creates a Bayesian network to model the thesaurus and uses probabilistic inference to select the set of descriptors having high posterior probability of being relevant given the available evidence (the document to be classified). Our model can be used without having preclassified training documents, although it improves its performance as long as more training data become available. We have ...

Research paper thumbnail of Evaluation Metrics

Research paper thumbnail of A Stochastic Model for XML Information Retrieval: Searching and Learning with the INEX collection

Abstract Most recent document standards like XML rely on structured representations. On the other... more Abstract Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present here a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learned from the ...

Research paper thumbnail of Towards a science of user engagement (Position Paper)

Research paper thumbnail of Handling data sparsity in collaborative filtering using emotion and semantic based features

Proceedings of the 34th …, Jan 1, 2011

Research paper thumbnail of Processing queries in session in a quantum-inspired IR framework

… in Information Retrieval, Jan 1, 2011

Research paper thumbnail of A Query Algebra for Quantum Information Retrieval

Proceedings of The Sixth Asia …, Jan 1, 2011