Knowledge Catalog (formerly Dataplex) (original) (raw)
Always-on context and governance for your agents
Knowledge Catalog is a universal context engine for your enterprise, helping agents execute complex tasks with accuracy.
Features
Aggregate context across your data estate
Knowledge Catalog aggregates native context across your Google and partner data platforms, semantic models, and third-party catalogs, unifying them into a single, governed source of truth. Automatically harvest technical metadata across your foundational systems—including BigQuery, AlloyDB, Spanner, Cloud SQL, Firestore (Preview), and Looker (Preview). It also supports integrations with third-party databases and partner catalogs like Ab Initio, Anomalo, Atlan, Collibra, and Datahub.
Enrich data and generate meaning through continuous learning
Automate data enrichment by mining schemas, logs, and BI models alongside unstructured entity extraction. Smart Storage and Object Context APIs auto-tag and embed files, while Gemini maps business entities from raw data. Context curation builds natural language glossaries and SQL patterns to capture intent. Finally, verified queries provide semantic guardrails and pre-validated logic to eliminate hallucinations and join errors, ensuring a dynamic, accurate map of your business.
Power search and unleash agents with secure retrieval
Unlock semantic search for your AI agents with sub-second latency and pinpoint relevance. Knowledge Catalog instantly ranks and returns the right context to agents in real-time. To gain trust, our global search respects metadata access permissions so agents can only retrieve and act on the assets they are explicitly authorized to see. Get measurable context evaluation with a robust evaluation framework to continuously optimize relevance and quality of the context feeding your agents.
Automated governance across your entire data ecosystem
Deliver scalable, automated governance by transforming raw metadata into a trusted foundation for AI. Enforce policy-based quality checks and anomaly detection across distributed data sources and multimodal data. By combining auto-captured lineage with automated metadata generation, Knowledge Catalog uses Gemini to translate all of these signals into clear business context. This unified control plane standardizes definitions and ensures every data asset is governed, traceable, and ready for agentic workflows.
Defined business semantics
Sync business logic into a single, governed layer. Knowledge Catalog automatically synthesizes context from schemas, query logs, and Looker models to define the relationships and measures AI agents need to reason effectively. Whether you are vibe coding semantic models or extracting logic from a sheet, everything flows into a unified glossary. This ensures consistent, policy-based governance across your entire enterprise.
Context retrieval for your agents
Enable your agents to retrieve holistic context from your enterprise data. Through semantic search, Context APIs, and MCP tools, agents can instantly discover data assets, extract pre-generated enriched metadata. By utilizing these pre-vetted "golden queries" and semantic insights, agents can execute complex tasks, retrieve accurate information, and navigate your data ecosystem with unprecedented precision and scale.
How It Works
Knowledge Catalog unifies fragmented data into agent-ready context. It provides always on cataloging and metadata harvesting across Google's Data Cloud, 3P databases and partners. This real-time enterprise context enables AI agents to move safely from insights to autonomous, governed action.
Common Uses
Data to AI governance
Tutorials, quickstarts, & labs
Build your data to AI governance foundation
In a single search experience, you can discover data and AI assets org-wide and instantly discover AI models, datasets, and related data artifacts, spanning projects and regions while adhering to IAM permissions. You can also augment assets with business context and enrich AI artifacts with business metadata for informed decision-making, such as ownership, key attributes, and relevant context.
Tutorials, quickstarts, & labs
Build your data to AI governance foundation
In a single search experience, you can discover data and AI assets org-wide and instantly discover AI models, datasets, and related data artifacts, spanning projects and regions while adhering to IAM permissions. You can also augment assets with business context and enrich AI artifacts with business metadata for informed decision-making, such as ownership, key attributes, and relevant context.
Multimodal data discovery
Tutorials, quickstarts, & labs
Make multimodal data insights easily discoverable by agents
Critical business knowledge often remains locked in unstructured sources like design docs, wikis, PDFs and images. You can instantly turn thousands of PDF contracts, design docs, and wikis into a structured knowledge graph. AI agents can then query this to answer complex questions—like "What are the common liability clauses in our 2025 vendor agreements?"—with grounded, traceable facts.
Tutorials, quickstarts, & labs
Make multimodal data insights easily discoverable by agents
Critical business knowledge often remains locked in unstructured sources like design docs, wikis, PDFs and images. You can instantly turn thousands of PDF contracts, design docs, and wikis into a structured knowledge graph. AI agents can then query this to answer complex questions—like "What are the common liability clauses in our 2025 vendor agreements?"—with grounded, traceable facts.
Automated data product creation
Tutorials, quickstarts, & labs
Automatically create governed data products for agents
Move beyond simple tables to create data products: self-contained units of intelligence that include built-in intent, SLAs, and governance constraints. By automatically inferring relationships across the data estate, Knowledge Catalog packages these assets into data products so they can be easily distributed and scaled across cross-functional AI teams and agents.
Tutorials, quickstarts, & labs
Automatically create governed data products for agents
Move beyond simple tables to create data products: self-contained units of intelligence that include built-in intent, SLAs, and governance constraints. By automatically inferring relationships across the data estate, Knowledge Catalog packages these assets into data products so they can be easily distributed and scaled across cross-functional AI teams and agents.
Govern your open lakehouse
Tutorials, quickstarts, & labs
Unified governance for your open lakehouse
Knowledge Catalog is deeply integrated with Google Cloud's Lakehouse and integrates its catalog to support governance policies that are centrally defined and enforced across multiple engines like BigQuery and Google Cloud Managed Service for Apache Spark. Knowledge Catalog also enriches governance across Lakehouse by supporting semantic search, data lineage, profiling, and quality checks, providing a managed foundation for your open data lakehouse.
Tutorials, quickstarts, & labs
Unified governance for your open lakehouse
Knowledge Catalog is deeply integrated with Google Cloud's Lakehouse and integrates its catalog to support governance policies that are centrally defined and enforced across multiple engines like BigQuery and Google Cloud Managed Service for Apache Spark. Knowledge Catalog also enriches governance across Lakehouse by supporting semantic search, data lineage, profiling, and quality checks, providing a managed foundation for your open data lakehouse.
Generate a solution
What problem are you trying to solve?
What you'll get:
Step-by-step guide
Reference architecture
Available pre-built solutions
Pricing
| Knowledge Catalog pricing | Knowledge Catalog pricing is based on pay-as-you-go usage. | |
|---|---|---|
| Service and usage | Description | Price (USD) |
| Knowledge Catalog processing | Knowledge Catalog standard and premium processing are metered by the Data Compute Unit (DCU). DCU-hour is an abstract billing unit and the actual metering depends on the individual features you use. | |
| Free tier Knowledge Catalog processingFirst 100 DCU-hour per month for Knowledge Catalog standard processing. | No charge | |
| Standard Knowledge Catalog processing Knowledge Catalog standard tier covers the data discovery functionality that automatically discovers table and fileset metadata from Cloud Storage. | Starting at$0.060per DCU-hour | |
| Premium Knowledge Catalog processingThe Knowledge Catalog premium processing tier covers the data exploration workbench, data lineage, data quality, and data profiling capabilities of Knowledge Catalog. | Starting at$0.089per DCU-hour | |
| Knowledge Catalog metadata and API pricing | Metadata storage pricingKnowledge Catalog measures the average amount of the stored metadata during a short time interval. For billing, these measurements are combined into a one-month average, which is multiplied by the monthly rate. | |
| Knowledge Catalog free tierFirst 1 MiB monthly average storage. | No charge | |
| Metadata storage Over 1 MiB monthly average storage. | Starting at$2per GiB per month | |
| API chargesKnowledge Catalog charges for API calls made to the Data Catalog API and Data Lineage API. | ||
| API callsFirst 1 million in a month. | No charge | |
| API callsOver 1 million in a month. | Starting at$10per 100,000 API calls | |
| Knowledge Catalog shuffle storage pricing | Shuffle storage pricing covers any disk storage specified in the environments configured for the data exploration workbench. | Starting at$0.040per GB-month |
| Other usage | Data organization features in Knowledge Catalog (lake, zone, or asset setup) and security policy application and propagation, are provided free of charge. | |
| Some Knowledge Catalog functionalities trigger job execution using Google Cloud Managed Service for Apache Spark, BigQuery, and Dataflow. Usages for those services are charged according to their respective pricing models, and charges will show up under those services as such. |
Knowledge Catalog pricing
Knowledge Catalog pricing is based on pay-as-you-go usage.
Knowledge Catalog processing
Description
Knowledge Catalog standard and premium processing are metered by the Data Compute Unit (DCU). DCU-hour is an abstract billing unit and the actual metering depends on the individual features you use.
Price (USD)
Free tier Knowledge Catalog processing
First 100 DCU-hour per month for Knowledge Catalog standard processing.
Description
No charge
Standard Knowledge Catalog processing
Knowledge Catalog standard tier covers the data discovery functionality that automatically discovers table and fileset metadata from Cloud Storage.
Description
Starting at
$0.060
per DCU-hour
Description
Starting at
$0.089
per DCU-hour
Knowledge Catalog metadata and API pricing
Description
Metadata storage pricing
Knowledge Catalog measures the average amount of the stored metadata during a short time interval. For billing, these measurements are combined into a one-month average, which is multiplied by the monthly rate.
Price (USD)
Knowledge Catalog free tier
First 1 MiB monthly average storage.
Description
No charge
Metadata storage
Over 1 MiB monthly average storage.
Description
Starting at
$2
per GiB per month
API calls
First 1 million in a month.
Description
No charge
API calls
Over 1 million in a month.
Description
Starting at
$10
per 100,000 API calls
Knowledge Catalog shuffle storage pricing
Description
Price (USD)
Starting at
$0.040
per GB-month
Description
Data organization features in Knowledge Catalog (lake, zone, or asset setup) and security policy application and propagation, are provided free of charge.
Price (USD)
Explore pricing
Visit the Knowledge Catalog pricing to see pricing per region and more.
Custom Quote
Connect with our sales team to get a custom quote for your organization.
Start your proof of concept
New customers get $300 in free credits
What is data governance?
How Knowledge Catalog works
Knowledge Catalog best practices
Learn more about governance for your lakehouse
Partners & Integration