Data Classes (original) (raw)

In Haystack, there are a handful of core classes that are regularly used in many different places. These are classes that carry data through the system and you are likely to interact with these as either the input or output of your pipeline.

Haystack uses data classes to help components communicate with each other in a simple and modular way. By doing this, data flows seamlessly through the Haystack pipelines. This page goes over the available data classes in Haystack: ByteStream, Answer (along with its variants ExtractedAnswer and GeneratedAnswer), ChatMessage, Document, and StreamingChunk, explaining how they contribute to the Haystack ecosystem.

You can check out the detailed parameters in our Data Classes API reference.

The Answer class serves as the base for responses generated within Haystack, containing the answer's data, the originating query, and additional metadata.

@dataclass(frozen=True)
class Answer:
    data: Any
    query: str
    meta: Dict[str, Any]

ExtractedAnswer is a subclass of Answer that deals explicitly with answers derived from Documents, offering more detailed attributes.

@dataclass
class ExtractedAnswer:
    query: str
    score: float
    data: Optional[str] = None
    document: Optional[Document] = None
    context: Optional[str] = None
    document_offset: Optional["Span"] = None
    context_offset: Optional["Span"] = None
    meta: Dict[str, Any] = field(default_factory=dict)

GeneratedAnswer extends the Answer class to accommodate answers generated from multiple Documents.

@dataclass
class GeneratedAnswer:
    data: str
    query: str
    documents: List[Document]
    meta: Dict[str, Any] = field(default_factory=dict)

ByteStream represents binary object abstraction in the Haystack framework and is crucial for handling various binary data formats.

@dataclass(frozen=True)
class ByteStream:
    data: bytes
    metadata: Dict[str, Any] = field(default_factory=dict, hash=False)
    mime_type: Optional[str] = field(default=None)
from haystack.dataclasses.byte_stream import ByteStream

image = ByteStream.from_file_path("dog.jpg")

ChatMessage is the central abstraction to represent a message for a LLM. It contains role, metadata and several types of content, including text, tool calls and tool calls results.

Read the detailed documentation for the ChatMessage data class on a dedicated ChatMessage page.

Document represents a central data abstraction in Haystack, capable of holding text, tables, and binary data.

@dataclass
class Document(metaclass=_BackwardCompatible):
    id: str = field(default="")
    content: Optional[str] = field(default=None)
    blob: Optional[ByteStream] = field(default=None)
    meta: Dict[str, Any] = field(default_factory=dict)
    score: Optional[float] = field(default=None)
    embedding: Optional[List[float]] = field(default=None)
    sparse_embedding: Optional[SparseEmbedding] = field(default=None)
from haystack import Document

documents = Document(content="Here are the contents of your document", embedding=[0.1]*768)

StreamingChunk represents a partially streamed LLM response, enabling real-time LLM response.

class StreamingChunk:
    content: str
    metadata: Dict[str, Any] = field(default_factory=dict, hash=False)

The SparseEmbedding class represents a sparse embedding: a vector where most values are zeros.

Tool is a data class representing a tool that Language Models can prepare a call for.

Read the detailed documentation for the Tool data class on a dedicated Tool page.

Updated 2 months ago


See the parameters details in our API reference: