DensePhrases/examples at main · princeton-nlp/DensePhrases (original) (raw)
We provide descriptions on how to use DensePhrases for different applications. For instance, based on the retrieved passages from DensePhrases, you can train a state-of-the-art open-domain question answering model called Fusion-in-Decoder by Izacard and Grave, 2021, or you can run entity linking with DensePhrases.
- Basics: Multi-Granularity Text Retrieval
- Create a Custom Phrase Index
- Open-Domain QA with Fusion-in-Decoder
- Entity Linking
- Knowledge-grounded Dialogue
- Slot Filling
Basics: Multi-Granularity Text Retrieval
The most basic use of DensePhrases is to retrieve phrases, sentences, paragraphs, or documents for your query.
from densephrases import DensePhrases
Load DensePhrases
model = DensePhrases( ... load_dir='princeton-nlp/densephrases-multi-query-multi', ... dump_dir='/path/to/densephrases-multi_wiki-20181220/dump' ... )
Search phrases
model.search('Who won the Nobel Prize in peace?', retrieval_unit='phrase', top_k=5) ['Denis Mukwege,', 'Theodore Roosevelt', 'Denis Mukwege', 'John Mott', 'Mother Teresa']
Search sentences
model.search('Why is the sky blue', retrieval_unit='sentence', top_k=1) ['The blue color is sometimes wrongly attributed to Rayleigh scattering, which is responsible for the color of the sky.']
Search paragraphs
model.search('How to become a great researcher', retrieval_unit='paragraph', top_k=1) ['... Levine said he believes the key to being a great researcher is having passion for research in and working on questions that the researcher is truly curious about. He said: "Have patience, persistence and enthusiasm and you’ll be fine."']
Search documents (Wikipedia titles)
model.search('What is the history of internet', retrieval_unit='document', top_k=3) ['Computer network', 'History of the World Wide Web', 'History of the Internet']
For batch queries, simply feed a list of queries as query
. To get more detailed search results, set return_meta=True
as follows:
Search phrases and get detailed results
phrases, metadata = model.search(['Who won the Nobel Prize in peace?', 'Name products of Apple.'], retrieval_unit='phrase', return_meta=True)
phrases[0] ['Denis Mukwege,', 'Theodore Roosevelt', 'Denis Mukwege', 'John Mott', 'Muhammad Yunus', ...]
metadata[0] [{'context': '... The most recent as of 2018, Denis Mukwege, was awarded his Peace Prize in 2018. ...', 'title': ['List of black Nobel laureates'], 'doc_idx': 5433697, 'start_pos': 558, 'end_pos': 572, 'start_idx': 15, 'end_idx': 16, 'score': 99.670166015625, ..., 'answer': 'Denis Mukwege,'}, ...]
Note that when the model returns phrases, it also returns passages in its metadata as described in our EMNLP paper.
CPU-only Mode
Load DensePhrases in CPU-only mode
model = DensePhrases( ... load_dir='princeton-nlp/densephrases-multi-query-multi', ... dump_dir='/path/to/densephrases-multi_wiki-20181220/dump', ... device='cpu', ... max_query_length=24, # reduce the maximum query length for a faster query encoding (optional) ... )
Changing the Index or the Encoder
Load DensePhrases with a smaller phrase index
model = DensePhrases( ... load_dir='princeton-nlp/densephrases-multi-query-multi', ... dump_dir='/path/to/densephrases-multi_wiki-20181220/dump', ... index_name='start/1048576_flat_OPQ96_small' ... )
Change the DensePhrases encoder to 'princeton-nlp/densephrases-multi-query-tqa' (trained on TriviaQA)
model.set_encoder('princeton-nlp/densephrases-multi-query-tqa')
Evaluation
import os
Evaluate loaded DensePhrases on Natural Questions
model.evaluate(test_path=os.path.join(os.environ['DATA_DIR'], 'open-qa/nq-open/test_preprocessed.json'))