Marc Sloan | University College London (original) (raw)
Papers by Marc Sloan
Theoretical frameworks like the Probability Ranking Principle and its more recent Interactive Inf... more Theoretical frameworks like the Probability Ranking Principle and its more recent Interactive Information Retrieval variant have guided the development of ranking and retrieval algorithms for decades, yet they are not capable of helping us model problems in Dynamic Information Retrieval which exhibit the following three properties; an observable user signal, retrieval over multiple stages and an overall search intent. In this paper a new theoretical framework for retrieval in these scenarios is proposed. We derive a general dynamic utility function for optimizing over these types of tasks, that takes into account the utility of each stage and the probability of observing user feedback. We apply our framework to experiments over TREC data in the dynamic multi page search scenario as a practical demonstration of its effectiveness and to frame the discussion of its use, its limitations and to compare it against the existing frameworks.
Information Retrieval Journal
Key to any research involving session search is the understanding of how a user’s queries evolve ... more Key to any research involving session search is the understanding of how a user’s queries evolve throughout the session. When a user creates a query reformulation, he or she is consciously retaining terms from their original query, removing others and adding new terms. By measuring the similarity between queries we can make inferences on the user’s information need and how successful their new query is likely to be. By identifying the origins of added terms we can infer the user’s motivations and gain an understanding of their interactions. In this paper we present a novel term-based methodology for understanding and interpreting query reformulation actions. We use TREC Session Track data to demonstrate how our technique is able to learn from query logs and we make use of click data to test user interaction behavior when reformulating queries. We identify and evaluate a range of term-based query reformulation strategies and show that our methods provide valuable insight into understanding query reformulation in session search.
WWW 2015
Queries issued to a search engine are often under-specified or ambiguous. The user’s search conte... more Queries issued to a search engine are often under-specified or ambiguous. The user’s search context or background may provide information that disambiguates their information need in order to automatically predict and issue a more effective query. The disambiguation can take place at different stages of the retrieval process. For instance, contextual query suggestions may be computed and recommended to users on the result page when appropriate, an approach that does not require modifying the original query’s results. Alternatively, the search engine can attempt to provide efficient access to new relevant documents by injecting these documents directly into search results based on the user’s context.
In this paper, we explore these complementary approaches and
how they might be combined. We first develop a general framework for mining context-sensitive query reformulations for query suggestion. We evaluate our context-sensitive suggestions against a state-of-the-art baseline using a click-based metric. The resulting query suggestions generated by our approach outperform the baseline by 13% overall and by 16% on an ambiguous query subset.
While the query suggestions generated by our approach have
higher quality than the existing baselines, we demonstrate that using them naïvely for injecting new documents into search results can lead to inferior rankings. To remedy this issue, we develop
a classifier that decides when to inject new search results using
features based on suggestion quality and user context. We show that our context-sensitive result fusion approach (Corfu) improves retrieval quality for ambiguous queries by up to 2.92%. Our approaches can efficiently scale to massive search logs, enabling a data-driven strategy that benefits from observing how users issue and reformulate queries in different contexts.
WSDM 2015
Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems... more Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. It will cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
SIGIR 2014
Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems... more Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. It will cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
WWW 2013, May 2013
Modern information retrieval interfaces typically involve multiple pages of search results, and u... more Modern information retrieval interfaces typically involve multiple pages of search results, and users who are recall minded or engaging in exploratory search using ad hoc queries are likely to access more than one page. Document rankings for such queries can be improved by allowing additional context to the query to be provided by the user herself using explicit ratings or implicit actions such as clickthroughs. Existing methods using this information usually involved detrimental UI changes that can lower user satisfaction. Instead, we propose a new feedback scheme that makes use of existing UIs and does not alter user's browsing behaviour; to maximise retrieval performance over multiple result pages, we propose a novel retrieval optimisation framework and show that the optimal ranking policy should choose a diverse, exploratory ranking to display on the first page. Then, a personalised re-ranking of the next pages can be generated based on the user's feedback from the first page. We show that document correlations used in result diversification have a significant impact on relevance feedback and its effectiveness over a search session. TREC evaluations demonstrate that our optimal rank strategy (including approximative Monte Carlo Sampling) can naturally optimise the trade-off between exploration and exploitation and maximise the overall user's satisfaction over time against a number of similar baselines.
WSCD 2013, Feb 4, 2013
Many Information Retrieval (IR) models make use of o?ffline statistical techniques to score docum... more Many Information Retrieval (IR) models make use of o?ffline statistical techniques to score documents for ranking over a single period, rather than use an online, dynamic system that is responsive to users over time. In this paper, we explicitly formulate a general Multi Period Information Retrieval problem, where we consider retrieval as a stochastic yet controllable process. The ranking action during the process continuously controls the retrieval system's dynamics, and an optimal ranking policy is found in order to maximise the overall users' satisfaction over the multiple periods as much as possible. Our derivations show interesting properties about how the posterior probability of the documents relevancy evolves from users feedbacks through clicks, and provides a plug-in framework for incorporating di?fferent click models. Based on the Multi-Armed Bandit theory, we propose a simple implementation of our framework using a dynamic ranking rule that takes rank bias and exploration of documents into account. We use TREC data to learn a suitable exploration parameter for our model, and then analyse its performance and a number of variants using a search log data set; the experiments suggest an ability to explore document relevance dynamically over time using user feedback in a way that can handle rank bias.
The dynamic nature of document relevance is largely ignored by traditional Information Retrieval ... more The dynamic nature of document relevance is largely ignored by traditional Information Retrieval (IR) models, which assume that scores (relevance) for documents given an information need are static. In this paper, we formulate a general Dynamical Information Retrieval problem, where we consider retrieval as a stochastic, controllable process. The ranking action continuously controls the retrieval system’s dynamics and an optimal ranking policy is found that maximises the overall users’ satisfaction during each period. Through deriving the posterior probability of the documents evolving relevancy from user clicks, we can provide a plugin framework for incorporating a number of click models, which can be combined with Multi-Armed Bandit theory and Portfolio Theory of IR to create a dynamic ranking rule that takes rank bias and click dependency into account. We verify the versatility of our algorithms in a number of experiments and demonstrate improved performance over strong baselines and as a result significant performance gains have been achieved.
Internet advertising is a fast growing business which has proved to be significantly important in... more Internet advertising is a fast growing business which has proved to be significantly important in digital economics. It is vitally important for both web search engines and online content providers and publishers because web advertising provides them with major sources of revenue. Its presence is increasingly important for the whole media industry due to the influence of the Web. For advertisers, it is a smarter alternative to traditional marketing media such as TVs and newspapers. As the web evolves and data collection continues, the design of methods for more targeted, interactive, and friendly advertising may have a major impact on the way our digital economy evolves, and to aid societal development.
Towards this goal mathematically well-grounded Computational Advertising methods are becoming necessary and will continue to develop as a fundamental tool towards the Web. As a vibrant new discipline, Internet advertising requires effort from different research domains including Information Retrieval, Machine Learning, Data Mining and Analytic, Statistics, Economics, and even Psychology to predict and understand user behaviours. In this paper, we provide a comprehensive survey on Internet advertising, discussing and classifying the research issues, identifying the recent technologies, and suggesting its future directions. To have a comprehensive picture, we first start with a brief history, introduction, and classification of the industry and present a schematic view of the new advertising ecosystem. We then introduce four major participants, namely advertisers, online publishers, ad exchanges and web users; and through analysing and discussing the major research problems and existing solutions from their perspectives respectively, we discover and aggregate the fundamental problems that characterise the newly-formed research field and capture its potential future prospects.
CAFE (Channel Access interFacE) is a C++ library that provides a modern, multifaceted interface t... more CAFE (Channel Access interFacE) is a C++ library that provides a modern, multifaceted interface to the EPICS-based control system. CAFE makes extensive use of
templates and containers with multiple STL-compatible access methods to enhance efficiency, flexibility and performance. Stability and robustness are accomplished by
ensuring that connectivity to EPICS channels remains in a well defined state in every eventuality, and results of all synchronous and asynchronous operations are captured
and reported with integrity. CAFE presents the user with a number of options for writing and retrieving data to and from the control system. In addition to basic read and write
operations, a further abstraction layer provides transparency to more intricate functionality involving logical sets of data; such object sequences are easily instantiated
through an XML-based configuration mechanism. CAFE's suitability for use in a broad spectrum of applications is demonstrated. These range from high performance Qt GUI
(Graphical User Interface) control widgets, to event processing agents that propagate data through the Object Managements Group's Data Distribution Service (OMG-DDS),
to script-like frameworks such as MATLAB. The methodology for the modular use of CAFE serves to improve maintainability by enforcing a logical boundary between the
channel access components and the programming extensions of the application framework at hand.
Theoretical frameworks like the Probability Ranking Principle and its more recent Interactive Inf... more Theoretical frameworks like the Probability Ranking Principle and its more recent Interactive Information Retrieval variant have guided the development of ranking and retrieval algorithms for decades, yet they are not capable of helping us model problems in Dynamic Information Retrieval which exhibit the following three properties; an observable user signal, retrieval over multiple stages and an overall search intent. In this paper a new theoretical framework for retrieval in these scenarios is proposed. We derive a general dynamic utility function for optimizing over these types of tasks, that takes into account the utility of each stage and the probability of observing user feedback. We apply our framework to experiments over TREC data in the dynamic multi page search scenario as a practical demonstration of its effectiveness and to frame the discussion of its use, its limitations and to compare it against the existing frameworks.
Information Retrieval Journal
Key to any research involving session search is the understanding of how a user’s queries evolve ... more Key to any research involving session search is the understanding of how a user’s queries evolve throughout the session. When a user creates a query reformulation, he or she is consciously retaining terms from their original query, removing others and adding new terms. By measuring the similarity between queries we can make inferences on the user’s information need and how successful their new query is likely to be. By identifying the origins of added terms we can infer the user’s motivations and gain an understanding of their interactions. In this paper we present a novel term-based methodology for understanding and interpreting query reformulation actions. We use TREC Session Track data to demonstrate how our technique is able to learn from query logs and we make use of click data to test user interaction behavior when reformulating queries. We identify and evaluate a range of term-based query reformulation strategies and show that our methods provide valuable insight into understanding query reformulation in session search.
WWW 2015
Queries issued to a search engine are often under-specified or ambiguous. The user’s search conte... more Queries issued to a search engine are often under-specified or ambiguous. The user’s search context or background may provide information that disambiguates their information need in order to automatically predict and issue a more effective query. The disambiguation can take place at different stages of the retrieval process. For instance, contextual query suggestions may be computed and recommended to users on the result page when appropriate, an approach that does not require modifying the original query’s results. Alternatively, the search engine can attempt to provide efficient access to new relevant documents by injecting these documents directly into search results based on the user’s context.
In this paper, we explore these complementary approaches and
how they might be combined. We first develop a general framework for mining context-sensitive query reformulations for query suggestion. We evaluate our context-sensitive suggestions against a state-of-the-art baseline using a click-based metric. The resulting query suggestions generated by our approach outperform the baseline by 13% overall and by 16% on an ambiguous query subset.
While the query suggestions generated by our approach have
higher quality than the existing baselines, we demonstrate that using them naïvely for injecting new documents into search results can lead to inferior rankings. To remedy this issue, we develop
a classifier that decides when to inject new search results using
features based on suggestion quality and user context. We show that our context-sensitive result fusion approach (Corfu) improves retrieval quality for ambiguous queries by up to 2.92%. Our approaches can efficiently scale to massive search logs, enabling a data-driven strategy that benefits from observing how users issue and reformulate queries in different contexts.
WSDM 2015
Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems... more Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. It will cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
SIGIR 2014
Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems... more Dynamic aspects of Information Retrieval (IR), including changes found in data, users and systems, are increasingly being utilized in search engines and information filtering systems. Existing IR techniques are limited in their ability to optimize over changes, learn with minimal computational footprint and be responsive and adaptive. The objective of this tutorial is to provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. It will cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics.
WWW 2013, May 2013
Modern information retrieval interfaces typically involve multiple pages of search results, and u... more Modern information retrieval interfaces typically involve multiple pages of search results, and users who are recall minded or engaging in exploratory search using ad hoc queries are likely to access more than one page. Document rankings for such queries can be improved by allowing additional context to the query to be provided by the user herself using explicit ratings or implicit actions such as clickthroughs. Existing methods using this information usually involved detrimental UI changes that can lower user satisfaction. Instead, we propose a new feedback scheme that makes use of existing UIs and does not alter user's browsing behaviour; to maximise retrieval performance over multiple result pages, we propose a novel retrieval optimisation framework and show that the optimal ranking policy should choose a diverse, exploratory ranking to display on the first page. Then, a personalised re-ranking of the next pages can be generated based on the user's feedback from the first page. We show that document correlations used in result diversification have a significant impact on relevance feedback and its effectiveness over a search session. TREC evaluations demonstrate that our optimal rank strategy (including approximative Monte Carlo Sampling) can naturally optimise the trade-off between exploration and exploitation and maximise the overall user's satisfaction over time against a number of similar baselines.
WSCD 2013, Feb 4, 2013
Many Information Retrieval (IR) models make use of o?ffline statistical techniques to score docum... more Many Information Retrieval (IR) models make use of o?ffline statistical techniques to score documents for ranking over a single period, rather than use an online, dynamic system that is responsive to users over time. In this paper, we explicitly formulate a general Multi Period Information Retrieval problem, where we consider retrieval as a stochastic yet controllable process. The ranking action during the process continuously controls the retrieval system's dynamics, and an optimal ranking policy is found in order to maximise the overall users' satisfaction over the multiple periods as much as possible. Our derivations show interesting properties about how the posterior probability of the documents relevancy evolves from users feedbacks through clicks, and provides a plug-in framework for incorporating di?fferent click models. Based on the Multi-Armed Bandit theory, we propose a simple implementation of our framework using a dynamic ranking rule that takes rank bias and exploration of documents into account. We use TREC data to learn a suitable exploration parameter for our model, and then analyse its performance and a number of variants using a search log data set; the experiments suggest an ability to explore document relevance dynamically over time using user feedback in a way that can handle rank bias.
The dynamic nature of document relevance is largely ignored by traditional Information Retrieval ... more The dynamic nature of document relevance is largely ignored by traditional Information Retrieval (IR) models, which assume that scores (relevance) for documents given an information need are static. In this paper, we formulate a general Dynamical Information Retrieval problem, where we consider retrieval as a stochastic, controllable process. The ranking action continuously controls the retrieval system’s dynamics and an optimal ranking policy is found that maximises the overall users’ satisfaction during each period. Through deriving the posterior probability of the documents evolving relevancy from user clicks, we can provide a plugin framework for incorporating a number of click models, which can be combined with Multi-Armed Bandit theory and Portfolio Theory of IR to create a dynamic ranking rule that takes rank bias and click dependency into account. We verify the versatility of our algorithms in a number of experiments and demonstrate improved performance over strong baselines and as a result significant performance gains have been achieved.
Internet advertising is a fast growing business which has proved to be significantly important in... more Internet advertising is a fast growing business which has proved to be significantly important in digital economics. It is vitally important for both web search engines and online content providers and publishers because web advertising provides them with major sources of revenue. Its presence is increasingly important for the whole media industry due to the influence of the Web. For advertisers, it is a smarter alternative to traditional marketing media such as TVs and newspapers. As the web evolves and data collection continues, the design of methods for more targeted, interactive, and friendly advertising may have a major impact on the way our digital economy evolves, and to aid societal development.
Towards this goal mathematically well-grounded Computational Advertising methods are becoming necessary and will continue to develop as a fundamental tool towards the Web. As a vibrant new discipline, Internet advertising requires effort from different research domains including Information Retrieval, Machine Learning, Data Mining and Analytic, Statistics, Economics, and even Psychology to predict and understand user behaviours. In this paper, we provide a comprehensive survey on Internet advertising, discussing and classifying the research issues, identifying the recent technologies, and suggesting its future directions. To have a comprehensive picture, we first start with a brief history, introduction, and classification of the industry and present a schematic view of the new advertising ecosystem. We then introduce four major participants, namely advertisers, online publishers, ad exchanges and web users; and through analysing and discussing the major research problems and existing solutions from their perspectives respectively, we discover and aggregate the fundamental problems that characterise the newly-formed research field and capture its potential future prospects.
CAFE (Channel Access interFacE) is a C++ library that provides a modern, multifaceted interface t... more CAFE (Channel Access interFacE) is a C++ library that provides a modern, multifaceted interface to the EPICS-based control system. CAFE makes extensive use of
templates and containers with multiple STL-compatible access methods to enhance efficiency, flexibility and performance. Stability and robustness are accomplished by
ensuring that connectivity to EPICS channels remains in a well defined state in every eventuality, and results of all synchronous and asynchronous operations are captured
and reported with integrity. CAFE presents the user with a number of options for writing and retrieving data to and from the control system. In addition to basic read and write
operations, a further abstraction layer provides transparency to more intricate functionality involving logical sets of data; such object sequences are easily instantiated
through an XML-based configuration mechanism. CAFE's suitability for use in a broad spectrum of applications is demonstrated. These range from high performance Qt GUI
(Graphical User Interface) control widgets, to event processing agents that propagate data through the Object Managements Group's Data Distribution Service (OMG-DDS),
to script-like frameworks such as MATLAB. The methodology for the modular use of CAFE serves to improve maintainability by enforcing a logical boundary between the
channel access components and the programming extensions of the application framework at hand.