Synthesizer Quickstart — SciPhi (original) (raw)

Synthesizer Quickstart#

Welcome to the Synthesizer quickstart guide! Synthesizer, or ΨΦ, is your portal to combining Retrieval-Augmented Generation (RAG) with large language models (LLMs) like OpenAI’s models, Anthropic, HuggingFace, and vLLM.

This guide will introduce you to:

Let’s get started!

Setting Up Your Environment#

Before you start, ensure you’ve installed Synthesizer:

pip install sciphi-synthesizer

For additional details, refer to the installation guide.

Using Synthesizer#

  1. Generate synthetic question answer pairs
    export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
    python -m synthesizer.scripts.data_augmenter run --dataset="wiki_qa"
    tail augmented_output/config_name_eq_answer_question__dataset_name_eq_wiki_qa.jsonl
    { "formatted_prompt": "... ### Question:\nwhat country did wine originate in\n\n### Input:\n1. URL: https://en.wikipedia.org/wiki/History%20of%20wine (Score: 0.85)\nTitle:History of wine....",
    { "completion": Wine originated in the South Caucasus, which is now part of modern-day Armenia ...
  2. Evaluate RAG pipeline performance
    export SCIPHI_API_KEY=MY_SCIPHI_API_KEY
    python -m synthesizer.scripts.rag_harness --rag_provider="agent-search" --llm_provider_name="sciphi" --n_samples=25
    ...
    INFO:main:Now generating completions...
    100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:29<00:00, 3.40it/s]
    INFO:main:Final Accuracy=0.42

Note

This is a basic introduction to Synthesizer. Check back later for more detailed and intricate documentation that delves deeper into advanced features and customization options.

Developing with Synthesizer#

Here’s how you can use Synthesizer to quickly set up and RAG augmented generation, without diving deep into intricate configurations:

Requires a valid SCIPHI_API_KEY in env ...

Imports

from synthesizer.core import LLMProviderName, RAGProviderName from synthesizer.interface import ( LLMInterfaceManager, RAGInterfaceManager, ) from synthesizer.llm import GenerationConfig

RAG Provider Settings

rag_interface = RAGInterfaceManager.get_interface_from_args( RAGProviderName("agent-search"), limit_hierarchical_url_results=rag_limit_hierarchical_url_results, limit_final_pagerank_results=rag_limit_final_pagerank_results, ) rag_context = rag_interface.get_rag_context(query)

LLM Provider Settings

llm_interface = LLMInterfaceManager.get_interface_from_args( LLMProviderName("openai"), )

generation_config = GenerationConfig( model_name=llm_model_name, max_tokens_to_sample=llm_max_tokens_to_sample, temperature=llm_temperature, top_p=llm_top_p, # other generation params here ... )

formatted_prompt = raw_prompt.format(rag_context=rag_context) completion = llm_interface.get_completion( formatted_prompt, generation_config )