6319 Omar mohamed Khattab - Academia.edu (original) (raw)

Uploads

Papers by 6319 Omar mohamed Khattab

Research paper thumbnail of Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Cornell University - arXiv, Jan 2, 2021

Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP mo... more Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge. To retrieve evidence passages, multi-hop models must contend with a fast-growing search space across the hops, represent complex queries that combine multiple information needs, and resolve ambiguity about the best order in which to hop between training passages. We tackle these problems via Baleen, a system that improves the accuracy of multi-hop retrieval while learning robustly from weak training signals in the many-hop setting. To tame the search space, we propose condensed retrieval, a pipeline that summarizes the retrieved passages after each hop into a single compact context. To model complex queries, we introduce a focused late interaction retriever that allows different parts of the same query representation to match disparate relevant passages. Lastly, to infer the hopping dependencies among unordered training passages, we devise latent hop ordering, a weak-supervision strategy in which the trained retriever itself selects the sequence of hops. We evaluate Baleen on retrieval for two-hop question answering and many-hop claim verification, establishing state-of-the-art performance. This three-hop claim illustrates three major challenges in multi-hop retrieval. First, multi-hop queries encompass multiple information needs; the claim above referenced facts from three disparate passages. Second, retrieval errors in each hop propagate to subsequent hops. This can happen if the model directly retrieves information about the Baseball Hall of Fame, confuses Red Flaherty with, say, Robert Flaherty, or singles out the MVP of, say, the 1958 World Series, which Flaherty also umpired. Third, due to the dependency between hops, retrievers must learn an effective sequence of hops, where previously-retrieved clues lead to other relevant passages. These inter-passage dependencies 35th Conference on Neural Information Processing Systems (NeurIPS 2021).

Research paper thumbnail of Plaid

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pre-trained language models are increasingly important components across multiple information ret... more Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. To dramatically speed up the search latency of late interaction, we introduce the Performance-optimized Late Interaction Driver (PLAID) engine. Without impacting quality, PLAID swiftly eliminates low-scoring passages using a novel centroid interaction mechanism that treats every passage as a lightweight bag of centroids. PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7× on a GPU and 45× on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2 to achieve latency of tens of milliseconds on a GPU and tens or just few hundreds of milliseconds on a CPU at large scale, even at the largest scales we evaluate with 140M passages. CCS CONCEPTS • Information systems → Top-k retrieval in databases; Document filtering.

Research paper thumbnail of Relevance-guided Supervision for OpenQA with ColBERT

Transactions of the Association for Computational Linguistics, 2021

Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding c... more Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Research paper thumbnail of ColBERT

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020

Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Informa... more Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Crucially, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring up to four orders-of-magnitude fewer FLOPs per query.

Research paper thumbnail of Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval

Cornell University - arXiv, Jan 2, 2021

Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP mo... more Multi-hop reasoning (i.e., reasoning across two or more documents) is a key ingredient for NLP models that leverage large corpora to exhibit broad knowledge. To retrieve evidence passages, multi-hop models must contend with a fast-growing search space across the hops, represent complex queries that combine multiple information needs, and resolve ambiguity about the best order in which to hop between training passages. We tackle these problems via Baleen, a system that improves the accuracy of multi-hop retrieval while learning robustly from weak training signals in the many-hop setting. To tame the search space, we propose condensed retrieval, a pipeline that summarizes the retrieved passages after each hop into a single compact context. To model complex queries, we introduce a focused late interaction retriever that allows different parts of the same query representation to match disparate relevant passages. Lastly, to infer the hopping dependencies among unordered training passages, we devise latent hop ordering, a weak-supervision strategy in which the trained retriever itself selects the sequence of hops. We evaluate Baleen on retrieval for two-hop question answering and many-hop claim verification, establishing state-of-the-art performance. This three-hop claim illustrates three major challenges in multi-hop retrieval. First, multi-hop queries encompass multiple information needs; the claim above referenced facts from three disparate passages. Second, retrieval errors in each hop propagate to subsequent hops. This can happen if the model directly retrieves information about the Baseball Hall of Fame, confuses Red Flaherty with, say, Robert Flaherty, or singles out the MVP of, say, the 1958 World Series, which Flaherty also umpired. Third, due to the dependency between hops, retrievers must learn an effective sequence of hops, where previously-retrieved clues lead to other relevant passages. These inter-passage dependencies 35th Conference on Neural Information Processing Systems (NeurIPS 2021).

Research paper thumbnail of Plaid

Proceedings of the 31st ACM International Conference on Information & Knowledge Management

Pre-trained language models are increasingly important components across multiple information ret... more Pre-trained language models are increasingly important components across multiple information retrieval (IR) paradigms. Late interaction, introduced with the ColBERT model and recently refined in ColBERTv2, is a popular paradigm that holds state-of-the-art status across many benchmarks. To dramatically speed up the search latency of late interaction, we introduce the Performance-optimized Late Interaction Driver (PLAID) engine. Without impacting quality, PLAID swiftly eliminates low-scoring passages using a novel centroid interaction mechanism that treats every passage as a lightweight bag of centroids. PLAID uses centroid interaction as well as centroid pruning, a mechanism for sparsifying the bag of centroids, within a highly-optimized engine to reduce late interaction search latency by up to 7× on a GPU and 45× on a CPU against vanilla ColBERTv2, while continuing to deliver state-of-the-art retrieval quality. This allows the PLAID engine with ColBERTv2 to achieve latency of tens of milliseconds on a GPU and tens or just few hundreds of milliseconds on a CPU at large scale, even at the largest scales we evaluate with 140M passages. CCS CONCEPTS • Information systems → Top-k retrieval in databases; Document filtering.

Research paper thumbnail of Relevance-guided Supervision for OpenQA with ColBERT

Transactions of the Association for Computational Linguistics, 2021

Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding c... more Systems for Open-Domain Question Answering (OpenQA) generally depend on a retriever for finding candidate passages in a large corpus and a reader for extracting answers from those passages. In much recent work, the retriever is a learned component that uses coarse-grained vector representations of questions and passages. We argue that this modeling choice is insufficiently expressive for dealing with the complexity of natural language questions. To address this, we define ColBERT-QA, which adapts the scalable neural retrieval model ColBERT to OpenQA. ColBERT creates fine-grained interactions between questions and passages. We propose an efficient weak supervision strategy that iteratively uses ColBERT to create its own training data. This greatly improves OpenQA retrieval on Natural Questions, SQuAD, and TriviaQA, and the resulting system attains state-of-the-art extractive OpenQA performance on all three datasets.

Research paper thumbnail of ColBERT

Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020

Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Informa... more Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Crucially, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring up to four orders-of-magnitude fewer FLOPs per query.