Knowledge Catalog (formerly Dataplex) (original) (raw)

Always-on context and governance for your agents

Knowledge Catalog is a universal context engine for your enterprise, helping agents execute complex tasks with accuracy.

Features

Aggregate context across your data estate

Knowledge Catalog aggregates native context across your Google and partner data platforms, semantic models, and third-party catalogs, unifying them into a single, governed source of truth. Automatically harvest technical metadata across your foundational systems—including BigQuery, AlloyDB, Spanner, Cloud SQL, Firestore (Preview), and Looker (Preview). It also supports integrations with third-party databases and partner catalogs like Ab Initio, Anomalo, Atlan, Collibra, and Datahub.

Enrich data and generate meaning through continuous learning

Automate data enrichment by mining schemas, logs, and BI models alongside unstructured entity extraction. Smart Storage and Object Context APIs auto-tag and embed files, while Gemini maps business entities from raw data. Context curation builds natural language glossaries and SQL patterns to capture intent. Finally, verified queries provide semantic guardrails and pre-validated logic to eliminate hallucinations and join errors, ensuring a dynamic, accurate map of your business.

Power search and unleash agents with secure retrieval

Unlock semantic search for your AI agents with sub-second latency and pinpoint relevance. Knowledge Catalog instantly ranks and returns the right context to agents in real-time. To gain trust, our global search respects metadata access permissions so agents can only retrieve and act on the assets they are explicitly authorized to see. Get measurable context evaluation with a robust evaluation framework to continuously optimize relevance and quality of the context feeding your agents.

Automated governance across your entire data ecosystem

Deliver scalable, automated governance by transforming raw metadata into a trusted foundation for AI. Enforce policy-based quality checks and anomaly detection across distributed data sources and multimodal data. By combining auto-captured lineage with automated metadata generation, Knowledge Catalog uses Gemini to translate all of these signals into clear business context. This unified control plane standardizes definitions and ensures every data asset is governed, traceable, and ready for agentic workflows.

Defined business semantics

Sync business logic into a single, governed layer. Knowledge Catalog automatically synthesizes context from schemas, query logs, and Looker models to define the relationships and measures AI agents need to reason effectively. Whether you are vibe coding semantic models or extracting logic from a sheet, everything flows into a unified glossary. This ensures consistent, policy-based governance across your entire enterprise.

Context retrieval for your agents

Enable your agents to retrieve holistic context from your enterprise data. Through semantic search, Context APIs, and MCP tools, agents can instantly discover data assets, extract pre-generated enriched metadata. By utilizing these pre-vetted "golden queries" and semantic insights, agents can execute complex tasks, retrieve accurate information, and navigate your data ecosystem with unprecedented precision and scale.

How It Works

Knowledge Catalog unifies fragmented data into agent-ready context. It provides always on cataloging and metadata harvesting across Google's Data Cloud, 3P databases and partners. This real-time enterprise context enables AI agents to move safely from insights to autonomous, governed action.

Common Uses

Data to AI governance

Tutorials, quickstarts, & labs

Build your data to AI governance foundation

In a single search experience, you can discover data and AI assets org-wide and instantly discover AI models, datasets, and related data artifacts, spanning projects and regions while adhering to IAM permissions. You can also augment assets with business context and enrich AI artifacts with business metadata for informed decision-making, such as ownership, key attributes, and relevant context.

Tutorials, quickstarts, & labs

Build your data to AI governance foundation

In a single search experience, you can discover data and AI assets org-wide and instantly discover AI models, datasets, and related data artifacts, spanning projects and regions while adhering to IAM permissions. You can also augment assets with business context and enrich AI artifacts with business metadata for informed decision-making, such as ownership, key attributes, and relevant context.

Multimodal data discovery

Tutorials, quickstarts, & labs

Make multimodal data insights easily discoverable by agents

Critical business knowledge often remains locked in unstructured sources like design docs, wikis, PDFs and images. You can instantly turn thousands of PDF contracts, design docs, and wikis into a structured knowledge graph. AI agents can then query this to answer complex questions—like "What are the common liability clauses in our 2025 vendor agreements?"—with grounded, traceable facts.

Tutorials, quickstarts, & labs

Make multimodal data insights easily discoverable by agents

Critical business knowledge often remains locked in unstructured sources like design docs, wikis, PDFs and images. You can instantly turn thousands of PDF contracts, design docs, and wikis into a structured knowledge graph. AI agents can then query this to answer complex questions—like "What are the common liability clauses in our 2025 vendor agreements?"—with grounded, traceable facts.

Automated data product creation

Tutorials, quickstarts, & labs

Automatically create governed data products for agents

Move beyond simple tables to create data products: self-contained units of intelligence that include built-in intent, SLAs, and governance constraints. By automatically inferring relationships across the data estate, Knowledge Catalog packages these assets into data products so they can be easily distributed and scaled across cross-functional AI teams and agents.

Tutorials, quickstarts, & labs

Automatically create governed data products for agents

Move beyond simple tables to create data products: self-contained units of intelligence that include built-in intent, SLAs, and governance constraints. By automatically inferring relationships across the data estate, Knowledge Catalog packages these assets into data products so they can be easily distributed and scaled across cross-functional AI teams and agents.

Govern your open lakehouse

Tutorials, quickstarts, & labs

Unified governance for your open lakehouse

Knowledge Catalog is deeply integrated with Google Cloud's Lakehouse and integrates its catalog to support governance policies that are centrally defined and enforced across multiple engines like BigQuery and Google Cloud Managed Service for Apache Spark. Knowledge Catalog also enriches governance across Lakehouse by supporting semantic search, data lineage, profiling, and quality checks, providing a managed foundation for your open data lakehouse.

Tutorials, quickstarts, & labs

Unified governance for your open lakehouse

Knowledge Catalog is deeply integrated with Google Cloud's Lakehouse and integrates its catalog to support governance policies that are centrally defined and enforced across multiple engines like BigQuery and Google Cloud Managed Service for Apache Spark. Knowledge Catalog also enriches governance across Lakehouse by supporting semantic search, data lineage, profiling, and quality checks, providing a managed foundation for your open data lakehouse.

Generate a solution

What problem are you trying to solve?

What you'll get:

Step-by-step guide

Reference architecture

Available pre-built solutions

Pricing

Knowledge Catalog pricing Knowledge Catalog pricing is based on pay-as-you-go usage.
Service and usage Description Price (USD)
Knowledge Catalog processing Knowledge Catalog standard and premium processing are metered by the Data Compute Unit (DCU). DCU-hour is an abstract billing unit and the actual metering depends on the individual features you use.
Free tier Knowledge Catalog processingFirst 100 DCU-hour per month for Knowledge Catalog standard processing. No charge
Standard Knowledge Catalog processing Knowledge Catalog standard tier covers the data discovery functionality that automatically discovers table and fileset metadata from Cloud Storage. Starting at$0.060per DCU-hour
Premium Knowledge Catalog processingThe Knowledge Catalog premium processing tier covers the data exploration workbench, data lineage, data quality, and data profiling capabilities of Knowledge Catalog. Starting at$0.089per DCU-hour
Knowledge Catalog metadata and API pricing Metadata storage pricingKnowledge Catalog measures the average amount of the stored metadata during a short time interval. For billing, these measurements are combined into a one-month average, which is multiplied by the monthly rate.
Knowledge Catalog free tierFirst 1 MiB monthly average storage. No charge
Metadata storage Over 1 MiB monthly average storage. Starting at$2per GiB per month
API chargesKnowledge Catalog charges for API calls made to the Data Catalog API and Data Lineage API.
API callsFirst 1 million in a month. No charge
API callsOver 1 million in a month. Starting at$10per 100,000 API calls
Knowledge Catalog shuffle storage pricing Shuffle storage pricing covers any disk storage specified in the environments configured for the data exploration workbench. Starting at$0.040per GB-month
Other usage Data organization features in Knowledge Catalog (lake, zone, or asset setup) and security policy application and propagation, are provided free of charge.
Some Knowledge Catalog functionalities trigger job execution using Google Cloud Managed Service for Apache Spark, BigQuery, and Dataflow. Usages for those services are charged according to their respective pricing models, and charges will show up under those services as such.

Knowledge Catalog pricing

Knowledge Catalog pricing is based on pay-as-you-go usage.

Knowledge Catalog processing

Description

Knowledge Catalog standard and premium processing are metered by the Data Compute Unit (DCU). DCU-hour is an abstract billing unit and the actual metering depends on the individual features you use.

Price (USD)

Free tier Knowledge Catalog processing

First 100 DCU-hour per month for Knowledge Catalog standard processing.

Description

No charge

Standard Knowledge Catalog processing

Knowledge Catalog standard tier covers the data discovery functionality that automatically discovers table and fileset metadata from Cloud Storage.

Description

Starting at

$0.060

per DCU-hour

Description

Starting at

$0.089

per DCU-hour

Knowledge Catalog metadata and API pricing

Description

Metadata storage pricing

Knowledge Catalog measures the average amount of the stored metadata during a short time interval. For billing, these measurements are combined into a one-month average, which is multiplied by the monthly rate.

Price (USD)

Knowledge Catalog free tier

First 1 MiB monthly average storage.

Description

No charge

Metadata storage

Over 1 MiB monthly average storage.

Description

Starting at

$2

per GiB per month

API calls

First 1 million in a month.

Description

No charge

API calls

Over 1 million in a month.

Description

Starting at

$10

per 100,000 API calls

Knowledge Catalog shuffle storage pricing

Description

Price (USD)

Starting at

$0.040

per GB-month

Description

Data organization features in Knowledge Catalog (lake, zone, or asset setup) and security policy application and propagation, are provided free of charge.

Price (USD)

Explore pricing

Visit the Knowledge Catalog pricing to see pricing per region and more.

Custom Quote

Connect with our sales team to get a custom quote for your organization.

Start your proof of concept

New customers get $300 in free credits

What is data governance?

How Knowledge Catalog works

Knowledge Catalog best practices

Learn more about governance for your lakehouse

Partners & Integration

Partnering with industry leaders

Partners

Partners