Introduction to ChromaDB (original) (raw)

Last Updated : 15 Apr, 2026

ChromaDB is an open-source vector database designed for efficiently storing, searching and managing vector embeddings which are numeric representations used in AI and machine learning for tasks like semantic search and recommendation systems. It enables fast similarity search and offers a simple API for developers.

Intro-to-ChromaDB-2

Architecture of ChromaDB

Key Features

Working

  1. **Embedding Generation: Data like text or images is converted into vector embeddings using a pre-trained or custom model. For example, a sentence like "The cat is on the mat" can be transformed into a numerical vector using a model like BERT or SentenceTransformers.
  2. **Storing Embeddings: The embeddings are stored in a ChromaDB collection, along with optional metadata like document ID, category or timestamp and unique identifiers.
  3. **Querying: Users can query the database by providing a vector or raw data which is converted to a vector. ChromaDB performs a similarity search to return the most relevant embeddings based on metrics like cosine similarity or euclidean distance.
  4. **Filtering with Metadata: Queries can include metadata filters to narrow down results. For example, a search might only return embeddings from a specific category or time range.
  5. **Retrieval: The database finds the top-k most similar embeddings and gives back their details and identifiers which can be used for search or recommendations.

ChromaDB Hierarchy

In ChromaDB, data is typically stored locally using storage backends like SQLite for persistence in single-node setups, though storage configuration may vary depending on deployment. Below is a breakdown of this hierarchy:

Hierarchy-of-ChromaDB

Hierarchy in ChromaDB

Implementation

Lets see step by step implementation of ChromaDB

Step 1: Install ChromaDB Library

We need to install the ChromaDB library to interact with the vector database.

Python `

!pip install chromadb

`

Step 2: Import ChromaDB Library

Import the ChromaDB library to begin using it in the script.

Python `

import chromadb

`

Step 3: Initialize the ChromaDB Client and create a Collection

Create a client instance to interact with the ChromaDB database and create a collection within ChromaDB which will store documents along with their metadata. In this case, the collection is named personal_collection.

Python `

chroma_client = chromadb.Client() collection = chroma_client.create_collection(name="personal_collection")

`

Step 4: Add Documents to the Collection

Add documents to the collection with their respective metadata and unique IDs. Each document is tagged with source information in the metadata.

Python `

collection.add( documents=[ "This is a document about machine learning", "This is another document about data science", "A third document about artificial intelligence" ], metadatas=[ {"source": "test1"}, {"source": "test2"}, {"source": "test3"} ], ids=[ "id1", "id2", "id3" ] )

`

Step 5: Query the Collection and Display Result

Query the collection to retrieve documents that are similar to the query text. The n_results=2 parameter specifies that only 2 results should be returned and display the results of the query.

Python `

results = collection.query( query_texts=[ "This is a query about machine learning and data science" ], n_results=2 )

print(results)

`

**Output:

output-of-chromadb

Output

**Note: The output attached are in the embedding format.

Use Cases

Advantages

Limitations