GitHub - deepset-ai/hayhooks: Easily deploy Haystack pipelines as REST APIs and MCP Tools. (original) (raw)

Hayhooks

Hayhooks makes it easy to deploy and serve Haystack pipelines.

With Hayhooks, you can:

Deploy your Haystack pipelines as REST APIs with maximum flexibility and minimal boilerplate code.
Expose your Haystack pipelines over the MCP protocol, making them available as tools in AI dev environments like Cursor or Claude Desktop. Under the hood, Hayhooks runs as an MCP Server, exposing each pipeline as an MCP Tool.
Expose your Haystack pipelines as OpenAI-compatible chat completion backends with streaming support (to be used with open-webui or any other OpenAI compatible client).
Control Hayhooks core APIs through chat - deploy, undeploy, list, or run Haystack pipelines by chatting with Claude Desktop, Cursor, or any other MCP client.

Table of Contents

Quick Start with Docker Compose
Quick Start
Install the package
Configuration
- Environment Variables
- CORS Settings
Logging
- Using the logger
- Changing the log level
CLI Commands
Start hayhooks
Deploy a pipeline
Support file uploads
Run pipelines from the CLI
- Run a pipeline from the CLI JSON-compatible parameters
- Run a pipeline from the CLI uploading files
MCP support
Hayhooks as an OpenAPI Tool Server in open-webui
- Example: Deploy a Haystack pipeline from open-webui chat interface
OpenAI Compatibility
Advanced Usage
- Run Hayhooks Programmatically
- Sharing code between pipeline wrappers
Deployment Guidelines
Legacy Features
- Deploy Pipeline Using YAML
License

Quick start with Docker Compose

To quickly get started with Hayhooks, we provide a ready-to-use Docker Compose 🐳 setup with pre-configured integration with open-webui.

It's available here.

Quick start

Install the package

Start by installing the package:

If you want to use the MCP Server, you need to install the hayhooks[mcp] package:

pip install hayhooks[mcp]

NOTE: You'll need to run at least Python 3.10+ to use the MCP Server.

Configuration

Currently, you can configure Hayhooks by:

Set the environment variables in an .env file in the root of your project.
Pass the supported arguments and options to hayhooks run command.
Pass the environment variables to the hayhooks command.

Environment variables

The following environment variables are supported:

HAYHOOKS_HOST: The host on which the server will listen.
HAYHOOKS_PORT: The port on which the server will listen.
HAYHOOKS_MCP_PORT: The port on which the MCP Server will listen.
HAYHOOKS_MCP_HOST: The host on which the MCP Server will listen.
HAYHOOKS_PIPELINES_DIR: The path to the directory containing the pipelines.
HAYHOOKS_ROOT_PATH: The root path of the server.
HAYHOOKS_ADDITIONAL_PYTHON_PATH: Additional Python path to be added to the Python path.
HAYHOOKS_DISABLE_SSL: Boolean flag to disable SSL verification when making requests from the CLI.
HAYHOOKS_USE_HTTPS: Boolean flag to use HTTPS when using CLI commands to interact with the server (e.g. hayhooks status will call https://HAYHOOKS_HOST:HAYHOOKS_PORT/status).
HAYHOOKS_SHOW_TRACEBACKS: Boolean flag to show tracebacks on errors during pipeline execution and deployment.
LOG: The log level to use (default: INFO).

CORS Settings

HAYHOOKS_CORS_ALLOW_ORIGINS: List of allowed origins (default: ["*"])
HAYHOOKS_CORS_ALLOW_METHODS: List of allowed HTTP methods (default: ["*"])
HAYHOOKS_CORS_ALLOW_HEADERS: List of allowed headers (default: ["*"])
HAYHOOKS_CORS_ALLOW_CREDENTIALS: Allow credentials (default: false)
HAYHOOKS_CORS_ALLOW_ORIGIN_REGEX: Regex pattern for allowed origins (default: null)
HAYHOOKS_CORS_EXPOSE_HEADERS: Headers to expose in response (default: [])
HAYHOOKS_CORS_MAX_AGE: Maxium age for CORS preflight responses in seconds (default: 600)

Logging

Using the logger

Hayooks comes with a default logger based on loguru.

To use it, you can import the log object from the hayhooks package:

Changing the log level

To change the log level, you can set the LOG environment variable to one of the levels supported by loguru.

For example, to use the DEBUG level, you can set:

LOG=DEBUG hayhooks run

or

LOG=debug hayhooks run

or in an .env file

LOG=debug

CLI commands

The hayhooks package provides a CLI to manage the server and the pipelines. Any command can be run with hayhooks <command> --help to get more information.

CLI commands are basically wrappers around the HTTP API of the server. The full API reference is available at //HAYHOOKS_HOST:HAYHOOKS_PORT/docs or //HAYHOOKS_HOST:HAYHOOKS_PORT/redoc.

hayhooks run # Start the server hayhooks status # Check the status of the server and show deployed pipelines

hayhooks pipeline deploy-files # Deploy a pipeline using PipelineWrapper hayhooks pipeline deploy # Deploy a pipeline from a YAML file hayhooks pipeline undeploy # Undeploy a pipeline hayhooks pipeline run # Run a pipeline

Start Hayhooks

Let's start Hayhooks:

This will start the Hayhooks server on HAYHOOKS_HOST:HAYHOOKS_PORT.

Deploy a pipeline

Now, we will deploy a pipeline to chat with a website. We have created an example in the examples/chat_with_website_streaming folder.

In the example folder, we have two files:

chat_with_website.yml: The pipeline definition in YAML format.
pipeline_wrapper.py (mandatory): A pipeline wrapper that uses the pipeline definition.

Why a pipeline wrapper?

The pipeline wrapper provides a flexible foundation for deploying Haystack pipelines by allowing users to:

Choose their preferred pipeline initialization method (YAML files, Haystack templates, or inline code)
Define custom pipeline execution logic with configurable inputs and outputs
Optionally expose OpenAI-compatible chat endpoints with streaming support for integration with interfaces like open-webui

The pipeline_wrapper.py file must contain an implementation of the BasePipelineWrapper class (see here for more details).

A minimal PipelineWrapper looks like this:

from pathlib import Path from typing import List from haystack import Pipeline from hayhooks import BasePipelineWrapper

class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: pipeline_yaml = (Path(file).parent / "chat_with_website.yml").read_text() self.pipeline = Pipeline.loads(pipeline_yaml)

def run_api(self, urls: List[str], question: str) -> str:
    result = self.pipeline.run({"fetcher": {"urls": urls}, "prompt": {"query": question}})
    return result["llm"]["replies"][0]

It contains two methods:

setup()

This method will be called when the pipeline is deployed. It should initialize the self.pipeline attribute as a Haystack pipeline.

You can initialize the pipeline in many ways:

Load it from a YAML file.
Define it inline as a Haystack pipeline code.
Load it from a Haystack pipeline template.

run_api(...)

This method will be used to run the pipeline in API mode, when you call the {pipeline_name}/run endpoint.

You can define the input arguments of the method according to your needs.

def run_api(self, urls: List[str], question: str, any_other_user_defined_argument: Any) -> str: ...

The input arguments will be used to generate a Pydantic model that will be used to validate the request body. The same will be done for the response type.

NOTE: Since Hayhooks will dynamically create the Pydantic models, you need to make sure that the input arguments are JSON-serializable.

run_api_async(...)

This method is the asynchronous version of run_api. It will be used to run the pipeline in API mode when you call the {pipeline_name}/run endpoint, but handles requests asynchronously for better performance under high load.

You can define the input arguments of the method according to your needs, just like with run_api.

async def run_api_async(self, urls: List[str], question: str, any_other_user_defined_argument: Any) -> str: # Use async/await with AsyncPipeline or async operations result = await self.pipeline.run_async({"fetcher": {"urls": urls}, "prompt": {"query": question}}) return result["llm"]["replies"][0]

This is particularly useful when:

Working with AsyncPipeline instances that support async execution
Integrating with async-compatible Haystack components (e.g., OpenAIChatGenerator with async support)
Handling I/O-bound operations more efficiently
Deploying pipelines that need to handle many concurrent requests

NOTE: You can implement either run_api, run_api_async, or both. Hayhooks will automatically detect which methods are implemented and route requests accordingly.

You can find complete working examples of async pipeline wrappers in the test files and async streaming examples.

To deploy the pipeline, run:

hayhooks pipeline deploy-files -n chat_with_website examples/chat_with_website

This will deploy the pipeline with the name chat_with_website. Any error encountered during development will be printed to the console and show in the server logs.

PipelineWrapper development with `overwrite` option

During development, you can use the --overwrite flag to redeploy your pipeline without restarting the Hayhooks server:

hayhooks pipeline deploy-files -n {pipeline_name} --overwrite {pipeline_dir}

This is particularly useful when:

Iterating on your pipeline wrapper implementation
Debugging pipeline setup issues
Testing different pipeline configurations

The --overwrite flag will:

Remove the existing pipeline from the registry
Delete the pipeline files from disk
Deploy the new version of your pipeline

For even faster development iterations, you can combine --overwrite with --skip-saving-files to avoid writing files to disk:

hayhooks pipeline deploy-files -n {pipeline_name} --overwrite --skip-saving-files {pipeline_dir}

This is useful when:

You're making frequent changes during development
You want to test a pipeline without persisting it
You're running in an environment with limited disk access

Additional dependencies

After installing the Hayhooks package, it might happen that during pipeline deployment you need to install additional dependencies in order to correctly initialize the pipeline instance when calling the wrapper's setup() method. For instance, the chat_with_website pipeline requires the trafilatura package, which is not installed by default.

⚠️ Sometimes you may need to enable tracebacks in hayhooks to see the full error message. You can do this by setting the HAYHOOKS_SHOW_TRACEBACKS environment variable to true or 1.

Then, assuming you've installed the Hayhooks package in a virtual environment, you will need to install the additional required dependencies yourself by running:

Support file uploads

Hayhooks can easily handle uploaded files in your pipeline wrapper run_api method by adding files: Optional[List[UploadFile]] = None as an argument.

Here's a simple example:

def run_api(self, files: Optional[List[UploadFile]] = None) -> str: if files and len(files) > 0: filenames = [f.filename for f in files if f.filename is not None] file_contents = [f.file.read() for f in files]

    return f"Received files: {', '.join(filenames)}"

return "No files received"

This will make Hayhooks handle automatically the file uploads (if they are present) and pass them to the run_api method. This also means that the HTTP request needs to be a multipart/form-data request.

Note also that you can handle both files and parameters in the same request, simply adding them as arguments to the run_api method.

def run_api(self, files: Optional[List[UploadFile]] = None, additional_param: str = "default") -> str: ...

You can find a full example in the examples/rag_indexing_query folder.

Run pipelines from the CLI

Run a pipeline from the CLI JSON-compatible parameters

You can run a pipeline by using the hayhooks pipeline run command. Under the hood, this will call the run_api method of the pipeline wrapper, passing parameters as the JSON body of the request. This is convenient when you want to do a test run of the deployed pipeline from the CLI without having to write any code.

To run a pipeline from the CLI, you can use the following command:

hayhooks pipeline run --param 'question="is this recipe vegan?"'

Run a pipeline from the CLI uploading files

This is useful when you want to run a pipeline that requires a file as input. In that case, the request will be a multipart/form-data request. You can pass both files and parameters in the same request.

NOTE: To use this feature, you need to deploy a pipeline which is handling files (see Support file uploads and examples/rag_indexing_query for more details).

Upload a whole directory

hayhooks pipeline run --dir files_to_index

Upload a single file

hayhooks pipeline run --file file.pdf

Upload multiple files

hayhooks pipeline run --dir files_to_index --file file1.pdf --file file2.pdf

Upload a single file passing also a parameter

hayhooks pipeline run --file file.pdf --param 'question="is this recipe vegan?"'

MCP support

NOTE: You'll need to run at least Python 3.10+ to use the MCP Server.

MCP Server

Hayhooks now supports the Model Context Protocol and can act as a MCP Server.

It will:

Expose Core Tools to make able to control Hayhooks directly from an IDE like Cursor or any other MCP client.
Expose the deployed Haystack pipelines as usable MCP Tools, using both Server-Sent Events (SSE) and (stateless) Streamable HTTP MCP transports.

(Note that SSE transport is deprecated and it's maintained only for backward compatibility).

To run the Hayhooks MCP Server, you can use the following command:

hayhooks mcp run

Hint: check --help to see all the available options

This will start the Hayhooks MCP Server on HAYHOOKS_MCP_HOST:HAYHOOKS_MCP_PORT.

Create a PipelineWrapper for exposing a Haystack pipeline as a MCP Tool

A MCP Tool requires the following properties:

name: The name of the tool.
description: The description of the tool.
inputSchema: A JSON Schema object describing the tool's input parameters.

For each deployed pipeline, Hayhooks will:

Use the pipeline wrapper name as MCP Tool name (always present).
Parse run_api method docstring:
- If you use Google-style or reStructuredText-style docstrings, use the first line as MCP Tool description and the rest as parameters (if present).
- Each parameter description will be used as the description of the corresponding Pydantic model field (if present).
Generate a Pydantic model from the inputSchema using the run_api method arguments as fields.

Here's an example of a PipelineWrapper implementation for the chat_with_website pipeline which can be used as a MCP Tool:

from pathlib import Path from typing import List from haystack import Pipeline from hayhooks import BasePipelineWrapper

class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: pipeline_yaml = (Path(file).parent / "chat_with_website.yml").read_text() self.pipeline = Pipeline.loads(pipeline_yaml)

def run_api(self, urls: List[str], question: str) -> str:
    #
    # NOTE: The following docstring will be used as MCP Tool description
    #
    """
    Ask a question about one or more websites using a Haystack pipeline.
    """
    result = self.pipeline.run({"fetcher": {"urls": urls}, "prompt": {"query": question}})
    return result["llm"]["replies"][0]

Skip MCP Tool listing

You can skip the MCP Tool listing by setting the skip_mcp class attribute to True in your PipelineWrapper class. This way, the pipeline will be deployed on Hayhooks but will not be listed as a MCP Tool when you run the hayhooks mcp run command.

class PipelineWrapper(BasePipelineWrapper): # This will skip the MCP Tool listing skip_mcp = True

def setup(self) -> None:
    ...

def run_api(self, urls: List[str], question: str) -> str:
    ...

Using Hayhooks MCP Server with Claude Desktop

As stated in Anthropic's documentation, Claude Desktop supports SSE and Streamable HTTP as MCP Transports only on "Claude.ai & Claude for Desktop for the Pro, Max, Teams, and Enterprise tiers".

If you are using the free tier, only STDIO transport is supported, so you need to use supergateway to connect to the Hayhooks MCP Server via SSE or Streamable HTTP.

After starting the Hayhooks MCP Server, open Settings → Developer in Claude Desktop and update the config file with the following examples:

Using supergateway to bridge Streamable HTTP transport

{ "mcpServers": { "hayhooks": { "command": "npx", "args": [ "-y", "supergateway", "--streamableHttp", "http://HAYHOOKS_MCP_HOST:HAYHOOKS_MCP_PORT/mcp" ] } } }

Using supergateway to bridge SSE transport

{ "mcpServers": { "hayhooks": { "command": "npx", "args": [ "-y", "supergateway", "--sse", "http://HAYHOOKS_MCP_HOST:HAYHOOKS_MCP_PORT/sse" ] } } }

Make sure Node.js is installed, as the npx command depends on it.

Using Hayhooks Core MCP Tools in IDEs like Cursor

Since Hayhooks MCP Server provides by default a set of Core MCP Tools, the MCP server will enable one to interact with Hayhooks in an agentic manner using IDEs like Cursor.

The exposed tools are:

get_all_pipeline_statuses: Get the status of all pipelines and list available pipeline names.
get_pipeline_status: Get status of a specific pipeline. Requires pipeline_name as an argument.
undeploy_pipeline: Undeploy a pipeline. Removes a pipeline from the registry, its API routes, and deletes its files. Requires pipeline_name as an argument.
deploy_pipeline: Deploy a pipeline from files (pipeline_wrapper.py and other files). Requires name (pipeline name), files (list of file contents), save_files (boolean), and overwrite (boolean) as arguments.

From Cursor Settings -> MCP, you can add a new MCP Server by specifying the following parameters (assuming you have Hayhooks MCP Server running on http://localhost:1417 with Streamable HTTP transport):

{ "mcpServers": { "hayhooks": { "url": "http://localhost:1417/mcp" } } }

Or if you need to use the SSE transport:

{ "mcpServers": { "hayhooks": { "url": "http://localhost:1417/sse" } } }

After adding the MCP Server, you should see the Hayhooks Core MCP Tools in the list of available tools:

Now in the Cursor chat interface you can use the Hayhooks Core MCP Tools by mentioning them in your messages.

Development and deployment of Haystack pipelines directly from Cursor

Here's a video example of how to develop and deploy a Haystack pipeline directly from Cursor:

Hayhooks as an OpenAPI Tool Server in `open-webui`

Since Hayhooks expose openapi-schema at /openapi.json, it can be used as an OpenAPI Tool Server.

open-webui has recently added support for OpenAPI Tool Servers, meaning that you can use the API endpoints of Hayhooks as tools in your chat interface.

You simply need to configure the OpenAPI Tool Server in the Settings -> Tools section, adding the URL of the Hayhooks server and the path to the openapi.json file:

Example: Deploy a Haystack pipeline from `open-webui` chat interface

Here's a video example of how to deploy a Haystack pipeline from the open-webui chat interface:

OpenAI compatibility

OpenAI-compatible endpoints generation

Hayhooks now can automatically generate OpenAI-compatible endpoints if you implement the run_chat_completion method in your pipeline wrapper.

This will make Hayhooks compatible with fully-featured chat interfaces like open-webui, so you can use it as a backend for your chat interface.

Using Hayhooks as `open-webui` backend

Requirements:

Ensure you have open-webui up and running (you can do it easily using docker, check their quick start guide).
Ensure you have Hayhooks server running somewhere. We will run it locally on http://localhost:1416.

Configuring `open-webui`

First, you need to turn off tags and title generation from Admin settings -> Interface:

Then you have two options to connect Hayhooks as a backend.

Add a Direct Connection from Settings -> Connections:

NOTE: Fill a random value as API key as it's not needed

Alternatively, you can add an additional OpenAI API Connections from Admin settings -> Connections:

Even in this case, remember to Fill a random value as API key.

run_chat_completion(...)

To enable the automatic generation of OpenAI-compatible endpoints, you need only to implement the run_chat_completion method in your pipeline wrapper.

def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Union[str, Generator]: ...

Let's update the previous example to add a streaming response:

from pathlib import Path from typing import Generator, List, Union from haystack import Pipeline from hayhooks import get_last_user_message, BasePipelineWrapper, log

URLS = ["https://haystack.deepset.ai", "https://www.redis.io", "https://ssi.inc"]

class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: ... # Same as before

def run_api(self, urls: List[str], question: str) -> str:
    ...  # Same as before

def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Union[str, Generator]:
    log.trace(f"Running pipeline with model: {model}, messages: {messages}, body: {body}")

    question = get_last_user_message(messages)
    log.trace(f"Question: {question}")

    # Plain pipeline run, will return a string
    result = self.pipeline.run({"fetcher": {"urls": URLS}, "prompt": {"query": question}})
    return result["llm"]["replies"][0]

Differently from the run_api method, the run_chat_completion has a fixed signature and will be called with the arguments specified in the OpenAI-compatible endpoint.

model: The name of the Haystack pipeline which is called.
messages: The list of messages from the chat in the OpenAI format.
body: The full body of the request.

Some notes:

Since we have only the user messages as input here, the question is extracted from the last user message and the urls argument is hardcoded.
In this example, the run_chat_completion method is returning a string, so the open-webui will receive a string as response and show the pipeline output in the chat all at once.
The body argument contains the full request body, which may be used to extract more information like the temperature or the max_tokens (see the OpenAI API reference for more information).

Finally, to use non-streaming responses in open-webui you need also to turn of Stream Chat Response chat settings.

Here's a video example:

run_chat_completion_async(...)

This method is the asynchronous version of run_chat_completion. It handles OpenAI-compatible chat completion requests asynchronously, which is particularly useful for streaming responses and high-concurrency scenarios.

async def run_chat_completion_async(self, model: str, messages: List[dict], body: dict) -> Union[str, AsyncGenerator]: log.trace(f"Running pipeline with model: {model}, messages: {messages}, body: {body}")

question = get_last_user_message(messages)
log.trace(f"Question: {question}")

# For async streaming responses
return async_streaming_generator(
    pipeline=self.pipeline,
    pipeline_run_args={"fetcher": {"urls": URLS}, "prompt": {"query": question}},
)

Like run_chat_completion, this method has a fixed signature and will be called with the same arguments. The key differences are:

It's declared as async and can use await for asynchronous operations
It can return an AsyncGenerator for streaming responses using async_streaming_generator
It provides better performance for concurrent chat requests
It's required when using async streaming with components that support async streaming callbacks

NOTE: You can implement either run_chat_completion, run_chat_completion_async, or both. When both are implemented, Hayhooks will prefer the async version for better performance.

You can find complete working examples combining async chat completion with streaming in the async streaming test examples.

Streaming responses in OpenAI-compatible endpoints

Hayhooks provides streaming_generator and async_streaming_generator utility functions that can be used to stream the pipeline output to the client.

Let's update the run_chat_completion method of the previous example:

from pathlib import Path from typing import Generator, List, Union from haystack import Pipeline from hayhooks import get_last_user_message, BasePipelineWrapper, log, streaming_generator

URLS = ["https://haystack.deepset.ai", "https://www.redis.io", "https://ssi.inc"]

class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: ... # Same as before

def run_api(self, urls: List[str], question: str) -> str:
    ...  # Same as before

def run_chat_completion(self, model: str, messages: List[dict], body: dict) -> Union[str, Generator]:
    log.trace(f"Running pipeline with model: {model}, messages: {messages}, body: {body}")

    question = get_last_user_message(messages)
    log.trace(f"Question: {question}")

    # Streaming pipeline run, will return a generator
    return streaming_generator(
        pipeline=self.pipeline,
        pipeline_run_args={"fetcher": {"urls": URLS}, "prompt": {"query": question}},
    )

Now, if you run the pipeline and call one of the following endpoints:

{pipeline_name}/chat
/chat/completions
/v1/chat/completions

You will see the pipeline output being streamed in OpenAI-compatible format to the client and you'll be able to see the output in chunks.

Since output will be streamed to open-webui there's no need to change Stream Chat Response chat setting (leave it as Default or On).

You can find a complete working example of streaming_generator usage in the examples/pipeline_wrappers/chat_with_website_streaming directory.

Here's a video example:

async_streaming_generator

For asynchronous pipelines or when you need better concurrency handling, Hayhooks also provides an async_streaming_generator utility function:

from pathlib import Path from typing import AsyncGenerator, List, Union from haystack import AsyncPipeline from hayhooks import get_last_user_message, BasePipelineWrapper, log, async_streaming_generator

URLS = ["https://haystack.deepset.ai", "https://www.redis.io", "https://ssi.inc"]

class PipelineWrapper(BasePipelineWrapper): def setup(self) -> None: pipeline_yaml = (Path(file).parent / "chat_with_website.yml").read_text() self.pipeline = AsyncPipeline.loads(pipeline_yaml) # Note: AsyncPipeline

async def run_chat_completion_async(self, model: str, messages: List[dict], body: dict) -> AsyncGenerator:
    log.trace(f"Running pipeline with model: {model}, messages: {messages}, body: {body}")

    question = get_last_user_message(messages)
    log.trace(f"Question: {question}")

    # Async streaming pipeline run, will return an async generator
    return async_streaming_generator(
        pipeline=self.pipeline,
        pipeline_run_args={"fetcher": {"urls": URLS}, "prompt": {"query": question}},
    )

The async_streaming_generator function:

Works with both Pipeline and AsyncPipeline instances
Requires components that support async streaming callbacks (e.g., OpenAIChatGenerator instead of OpenAIGenerator)
Provides better performance for concurrent streaming requests
Returns an AsyncGenerator that yields chunks asynchronously
Automatically handles async pipeline execution and cleanup

NOTE: The streaming component in your pipeline must support async streaming callbacks. If you get an error about async streaming support, either use the sync streaming_generator or switch to async-compatible components.

Integration with haystack OpenAIChatGenerator

Since Hayhooks is OpenAI-compatible, it can be used as a backend for the haystack OpenAIChatGenerator.

Assuming you have a Haystack pipeline named chat_with_website_streaming and you have deployed it using Hayhooks, here's an example script of how to use it with the OpenAIChatGenerator:

from haystack.components.generators.chat.openai import OpenAIChatGenerator from haystack.utils import Secret from haystack.dataclasses import ChatMessage from haystack.components.generators.utils import print_streaming_chunk

client = OpenAIChatGenerator( model="chat_with_website_streaming", api_key=Secret.from_token("not-relevant"), # This is not used, you can set it to anything api_base_url="http://localhost:1416/v1/", streaming_callback=print_streaming_chunk, )

client.run([ChatMessage.from_user("Where are the offices or SSI?")])

> The offices of Safe Superintelligence Inc. (SSI) are located in Palo Alto, California, and Tel Aviv, Israel.

> {'replies': [ChatMessage(_role=<ChatRole.ASSISTANT: 'assistant'>, _content=[TextContent(text='The offices of Safe >Superintelligence Inc. (SSI) are located in Palo Alto, California, and Tel Aviv, Israel.')], _name=None, _meta={'model': >'chat_with_website_streaming', 'index': 0, 'finish_reason': 'stop', 'completion_start_time': '2025-02-11T15:31:44.599726', >'usage': {}})]}

Advanced usage

Run Hayhooks programmatically

A Hayhooks app instance can be run programmatically created by using the create_app function. This is useful if you want to add custom routes or middleware to Hayhooks.

Here's an example script:

import uvicorn from hayhooks.settings import settings from fastapi import Request from hayhooks import create_app

Create the Hayhooks app

hayhooks = create_app()

Add a custom route

@hayhooks.get("/custom") async def custom_route(): return {"message": "Hi, this is a custom route!"}

Add a custom middleware

@hayhooks.middleware("http") async def custom_middleware(request: Request, call_next): response = await call_next(request) response.headers["X-Custom-Header"] = "custom-header-value" return response

if name == "main": uvicorn.run("app:hayhooks", host=settings.host, port=settings.port)

Hayhooks allows you to use your custom code in your pipeline wrappers adding a specific path to the Hayhooks Python Path.

You can do this in three ways:

Set the HAYHOOKS_ADDITIONAL_PYTHON_PATH environment variable to the path of the folder containing your custom code.
Add HAYHOOKS_ADDITIONAL_PYTHON_PATH to the .env file.
Use the --additional-python-path flag when launching Hayhooks.

For example, if you have a folder called common with a my_custom_lib.py module which contains the my_function function, you can deploy your pipelines by using the following command:

export HAYHOOKS_ADDITIONAL_PYTHON_PATH='./common' hayhooks run

Then you can use the custom code in your pipeline wrappers by importing it like this:

from my_custom_lib import my_function

Note that you can use both absolute and relative paths (relative to the current working directory).

You can check out a complete example in the examples/shared_code_between_wrappers folder.

Deployment guidelines

We have some dedicated documentation for deployment:

Docker-based deployments: https://docs.haystack.deepset.ai/docs/docker
Kubernetes-based deployments: https://docs.haystack.deepset.ai/docs/kubernetes

We also have some additional deployment guidelines, see deployment_guidelines.md.

Legacy Features

Deploy a pipeline using only its YAML definition

⚠️ This way of deployment is not maintained anymore and will be deprecated in the future.

We're still supporting the Hayhooks former way to deploy a pipeline.

The former command hayhooks deploy is now changed to hayhooks pipeline deploy and can be used to deploy a pipeline only from a YAML definition file.

For example:

hayhooks pipeline deploy -n chat_with_website examples/chat_with_website/chat_with_website.yml

This will deploy the pipeline with the name chat_with_website from the YAML definition file examples/chat_with_website/chat_with_website.yml. You then can check the generated docs at http://HAYHOOKS_HOST:HAYHOOKS_PORT/docs or http://HAYHOOKS_HOST:HAYHOOKS_PORT/redoc, looking at the POST /chat_with_website endpoint.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

GitHub - deepset-ai/hayhooks: Easily deploy Haystack pipelines as REST APIs and MCP Tools. (original) (raw)

Hayhooks

Quick start with Docker Compose

Quick start

Install the package

Configuration

Environment variables

CORS Settings

Logging

Using the logger

Changing the log level

or

or in an .env file

CLI commands

Start Hayhooks

Deploy a pipeline

Why a pipeline wrapper?

setup()

run_api(...)

run_api_async(...)

PipelineWrapper development with overwrite option

Additional dependencies

Support file uploads

Run pipelines from the CLI

Run a pipeline from the CLI JSON-compatible parameters

Run a pipeline from the CLI uploading files

Upload a whole directory

Upload a single file

Upload multiple files

Upload a single file passing also a parameter

MCP support

MCP Server

Hint: check --help to see all the available options

Create a PipelineWrapper for exposing a Haystack pipeline as a MCP Tool

Skip MCP Tool listing

Using Hayhooks MCP Server with Claude Desktop

Using supergateway to bridge Streamable HTTP transport

Using supergateway to bridge SSE transport

Using Hayhooks Core MCP Tools in IDEs like Cursor

Development and deployment of Haystack pipelines directly from Cursor

Hayhooks as an OpenAPI Tool Server in open-webui

Example: Deploy a Haystack pipeline from open-webui chat interface

OpenAI compatibility

OpenAI-compatible endpoints generation

Using Hayhooks as open-webui backend

Configuring open-webui

run_chat_completion(...)

run_chat_completion_async(...)

Streaming responses in OpenAI-compatible endpoints

async_streaming_generator

Integration with haystack OpenAIChatGenerator

> The offices of Safe Superintelligence Inc. (SSI) are located in Palo Alto, California, and Tel Aviv, Israel.

Advanced usage

Run Hayhooks programmatically

Create the Hayhooks app

Add a custom route

Add a custom middleware

Sharing code between pipeline wrappers

Deployment guidelines

Legacy Features

Deploy a pipeline using only its YAML definition

License

PipelineWrapper development with `overwrite` option

Hayhooks as an OpenAPI Tool Server in `open-webui`

Example: Deploy a Haystack pipeline from `open-webui` chat interface

Using Hayhooks as `open-webui` backend

Configuring `open-webui`