GitHub - goutamadwant/vectormigrate: Safe embedding and vector-index migration tooling for retrieval systems (original) (raw)

💡 TL;DR

Changing your AI embedding model usually means downtime, full re-embedding costs, or silent ranking corruption. vectormigrate makes this transition safe, structured, and mathematical. It provides a formal ABI (Application Binary Interface) for vectors, allowing you to seamlessly test, evaluate, and transition between different embedding models in live production systems (like OpenSearch, Weaviate, Qdrant, and pgvector).

🎯 Why This Library Exists

When a team upgrades an embedding stack, it often changes more than just a model name. You might change:

📏 Vector Dimension
📐 Similarity Metric
⚖️ Normalization Policy
✂️ Chunking and Preprocessing
🗄️ Backend Index Shape

In practice, teams fall into three failure modes:

💥 Full re-embed & hard cutover: Expensive, risky, and causes downtime.
🎭 Mixing old & new vectors: Silently corrupts the ranking math.
📜 Vendor-specific throwaway scripts: Weak testing and no reusable governance.

vectormigrate treats every embedding configuration as an explicit compatibility contract and turns migration into a staged, testable workflow.

📦 Dependencies

By design, vectormigrate is lightweight and keeps your production environment lean:

Core: numpy >= 1.26 (The exact mathematical framework needed; minimal bloat)
Integration (Optional): psycopg[binary] >= 3.2.0 (for pgvector target databases)
Dev/Test (Optional): pytest, ruff, mypy, build

🚀 Quick Start

Installation

Core install

pip install vectormigrate

With live backend integrations (e.g., pgvector)

pip install "vectormigrate[integration]"

1️⃣ Register an Embedding ABI (The Contract)

from vectormigrate import EmbeddingABI, SQLiteRegistry

registry = SQLiteRegistry("/tmp/vectormigrate.sqlite") abi = EmbeddingABI( model_id="text-embedding-3-large", provider="openai", version="2026.03", dimensions=3072, ) registry.register_abi(abi) print(f"Registered ABI: {abi.abi_id}")

2️⃣ Create a Migration Plan

from vectormigrate import MigrationPlan

plan = MigrationPlan( source_abi_id="openai/text-embedding-3-large@2026.03#v1", target_abi_id="openai/text-embedding-3-large@2026.04#v1", alias_name="retrieval_active", ) registry.create_plan(plan) print(f"Active Plan ID: {plan.plan_id}")

3️⃣ Run a Live Demo CLI

Watch the orchestrator securely manage a dual-write and backfill migration locally:

python3 -m vectormigrate.cli demo --db /tmp/vectormigrate-demo.sqlite

📚 Feature Guide

🔀 Compatibility Adapters

Don't want to re-embed everything right away? Use our built-in mathematical space adapters to query old vectors with new models during the transition window:

from vectormigrate import OrthogonalProcrustesAdapter, LowRankAffineAdapter, ResidualMLPAdapter

procrustes = OrthogonalProcrustesAdapter() affine = LowRankAffineAdapter(rank=4) mlp = ResidualMLPAdapter(hidden_dim=16, epochs=50, learning_rate=0.01)

📊 Artifact & Report Export

Prove to your team that the migration was safe with exported dashboards and artifacts:

from vectormigrate import export_run_artifact_bundle

manifest = export_run_artifact_bundle( registry=registry, plan_id="plan-123", output_dir="/tmp/vectormigrate-artifacts", )

🏗️ Architecture & Formal Model

vectormigrate separates the migration problem into four robust planes:

Control plane: ABI manifests, migration plans, audit events.
Execution plane: Provisioning, dual-write, backfill, alias swap, rollback.
Compatibility plane: Mathematical projections and confidence-gated routing.
Evaluation plane: Offline metrics (Recall@k, nDCG@k), shadow hooks.

Supported Live Backends

The library includes native adapters to safely orchestrate migrations on the following engines:

✅ OpenSearch
✅ Weaviate
✅ Qdrant
✅ pgvector
✅ In-Memory (for testing)

📖 Deep Dive Documentation

🤝 Contributing & Security

We welcome contributions! Please see: