GitHub - vaadin/vaadin-mcp: A complete solution for ingesting, indexing, and retrieving Vaadin documentation through semantic search. (original) (raw)

Vaadin Documentation RAG Service

A sophisticated, hierarchically-aware Retrieval-Augmented Generation (RAG) system for Vaadin documentation that understands document structure, provides framework-specific filtering, and enables intelligent parent-child navigation through documentation sections.

๐ŸŽฏ Project Overview

This project provides an advanced RAG system with enhanced hybrid search that:

๐Ÿ—๏ธ Architecture

vaadin-documentation-services/
โ”œโ”€โ”€ packages/
โ”‚   โ”œโ”€โ”€ core-types/              # Shared TypeScript interfaces
โ”‚   โ”œโ”€โ”€ 1-asciidoc-converter/    # AsciiDoc โ†’ Markdown + metadata extraction
โ”‚   โ”œโ”€โ”€ 2-embedding-generator/   # Markdown โ†’ Vector database with hierarchical chunking
โ”‚   โ”œโ”€โ”€ rest-server/             # Enhanced REST API with hybrid search + reranking
โ”‚   โ””โ”€โ”€ mcp-server/              # MCP server with hierarchical navigation
โ”œโ”€โ”€ package.json                 # Bun workspace configuration
โ””โ”€โ”€ PROJECT_PLAN.md             # Complete project documentation

Data Flow

flowchart TD subgraph "Step 1: Documentation Processing" VaadinDocs["๐Ÿ“š Vaadin Docs
(AsciiDoc)"] Converter["๐Ÿ”„ AsciiDoc Converter
โ€ข Framework detection
โ€ข URL generation
โ€ข Markdown output"] Processor["โšก Embedding Generator
โ€ข Hierarchical chunking
โ€ข Parent-child relationships
โ€ข OpenAI embeddings"] end

subgraph "Step 2: Enhanced Retrieval"
    Pinecone["๐Ÿ—„๏ธ Pinecone Vector DB<br/>โ€ข Rich metadata<br/>โ€ข Hierarchical relationships<br/>โ€ข Framework tags"]
    RestAPI["๐ŸŒ REST API<br/>โ€ข Enhanced hybrid search<br/>โ€ข Native Pinecone reranking<br/>โ€ข Framework filtering"]
end

subgraph "Step 3: Agent Integration"
    MCP["๐Ÿค– MCP Server<br/>โ€ข search_vaadin_docs<br/>โ€ข get_full_document<br/>โ€ข Full document retrieval"]
    IDEs["๐Ÿ’ป IDE Assistants<br/>โ€ข Context-aware search<br/>โ€ข Hierarchical exploration<br/>โ€ข Framework-specific help"]
end

VaadinDocs --> Converter
Converter --> Processor
Processor --> Pinecone
Pinecone <--> RestAPI
RestAPI <--> MCP
MCP <--> IDEs

classDef processing fill:#e1f5fe,stroke:#01579b,stroke-width:2px
classDef storage fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
classDef api fill:#e8f5e8,stroke:#2e7d32,stroke-width:2px
classDef agent fill:#fff3e0,stroke:#e65100,stroke-width:2px

class VaadinDocs,Converter,Processor processing
class Pinecone,RestAPI storage
class MCP api
class IDEs agent

Loading

โœจ Key Features

๐ŸŒณ Hierarchical Navigation

๐ŸŽ›๏ธ Developer Experience

๐Ÿš€ Quick Start

Prerequisites

Installation

Clone and install dependencies

git clone https://github.com/vaadin/vaadin-documentation-services cd vaadin-documentation-services bun install

Environment Setup

Create .env file with your API keys

echo "OPENAI_API_KEY=your_openai_api_key" > .env echo "PINECONE_API_KEY=your_pinecone_api_key" >> .env echo "PINECONE_INDEX=your_pinecone_index" >> .env

Running the System

1. Process Documentation (One-time setup)

Convert AsciiDoc to Markdown with metadata

cd packages/1-asciidoc-converter bun run convert

Generate embeddings and populate vector database

cd ../2-embedding-generator bun run generate

2. Start REST API Server

cd packages/rest-server bun run start

Server runs at http://localhost:3001

3. Use MCP Server with IDE Assistant

The MCP server is deployed and available remotely via HTTP transport at:https://mcp.vaadin.com/

Configure your IDE assistant to use the Streamable HTTP transport:

import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js"; const transport = new StreamableHTTPClientTransport( new URL("https://mcp.vaadin.com/") );

๐Ÿ“ฆ Package Details

Core Types (packages/core-types/)

Shared TypeScript interfaces used across all packages:

AsciiDoc Converter (packages/1-asciidoc-converter/)

Converts Vaadin AsciiDoc documentation to Markdown with metadata:

cd packages/1-asciidoc-converter bun run convert # Convert all documentation bun run test # Run framework detection tests

Embedding Generator (packages/2-embedding-generator/)

Creates vector embeddings with hierarchical relationships:

cd packages/2-embedding-generator bun run generate # Generate embeddings from Markdown bun run test # Run chunking and relationship tests

REST Server (packages/rest-server/)

Enhanced API server with hybrid search capabilities:

cd packages/rest-server bun run start # Start production server bun run test # Run comprehensive test suite bun run test:verbose # Detailed test output

API Endpoints:

MCP Server (packages/mcp-server/)

Model Context Protocol server for IDE assistant integration:

cd packages/mcp-server bun run build # Build for distribution bun run test # Run document-based tests

Available Tools:

๐Ÿงช Testing

Each package includes comprehensive test suites:

Test individual packages

cd packages/1-asciidoc-converter && bun run test cd packages/2-embedding-generator && bun run test
cd packages/rest-server && bun run test cd packages/mcp-server && bun run test

Run REST server against live endpoint

cd packages/rest-server && bun run test:server

๐Ÿ“ˆ Performance & Metrics

Search Quality

System Performance

Production Readiness

๐ŸŒ Deployment

REST Server

The REST server is available at:

MCP Server

The MCP server is available at:

Documentation Processing

Automated via GitHub Actions:

๐Ÿ”ง Development

Workspace Structure

This project uses Bun workspaces for package management:

bun install # Install all dependencies bun run build # Build all packages bun run test # Test all packages

Adding New Features

  1. Core Types: Add interfaces to packages/core-types/
  2. Processing: Extend converters in packages/1-asciidoc-converter/ or packages/2-embedding-generator/
  3. API: Enhance search in packages/rest-server/
  4. Integration: Update MCP tools in packages/mcp-server/

Architecture Principles

๐Ÿ“š Documentation

๐Ÿ† Project Success

This project successfully delivered:

โœ… Sophisticated RAG System: Replaced naive implementation with hierarchically-aware search
โœ… Enhanced User Experience: Agents can now navigate from specific details to broader context
โœ… Production Quality: Clean architecture, comprehensive testing, and error handling
โœ… Framework Intelligence: Accurate Flow/Hilla content separation with common content inclusion
โœ… Developer Integration: Seamless IDE assistant integration via MCP protocol

The system now provides intelligent, context-aware documentation search that understands the hierarchical structure of Vaadin documentation and enables sophisticated agent interactions.

๐Ÿ“„ License

MIT - See license file for details.


Built with โค๏ธ for the Vaadin developer community