🎨 NeMo Data Designer | NVIDIA NeMo Data Designer (original) (raw)

GitHubLicenseNeMo Microservices

πŸ‘‹ Welcome! Data Designer is an orchestration framework for generating high-quality synthetic data. You provide LLM endpoints (NVIDIA, OpenAI, vLLM, etc.), and Data Designer handles batching, parallelism, validation, and more.

Configure columns and models β†’ Preview samples and iterate β†’ Create your full dataset at scale.

Unlike raw LLM calls, Data Designer gives you statistical diversity, field correlations, automated validation, and reproducible workflows. For details, see Architecture & Performance.

πŸ“ Want to hear from the team? Check out our Dev Notes for deep dives, best practices, and insights.

Install

Setup

Get an API key from one of the default providers and set it as an environment variable:

Verify your configuration is ready:

This displays the pre-configured model providers and models. See CLI Configuration to customize.

Your First Dataset

Let’s generate multilingual greetings to see Data Designer in action:

πŸŽ‰ That’s it! You’ve just designed your first synthetic dataset.

πŸš€ Next Steps

Learn More