Knowledge Catalog overview (original) (raw)

Knowledge Catalog is a Gemini-powered data catalog that provides universal business context and governance for your entire data estate. By automatically extracting semantics from structured and unstructured data, it builds a dynamic context graph that grounds AI agents in enterprise truth and reduces hallucinations. Data teams and AI developers use Knowledge Catalog to discover data, enforce policies, and retrieve rich context for both analytics and autonomous applications. For a detailed walkthrough of Knowledge Catalog, see the embedded video.

Dataplex Universal Catalog is now Knowledge Catalog

To better reflect the vision of unifying data governance with generative AI capabilities, Dataplex Universal Catalog is now Knowledge Catalog. This evolution of the product name represents a shift from a conventional, passive metadata registry to an active, AI-powered context graph.

Why did Dataplex become Knowledge Catalog

As organizations accelerate their generative AI adoption, AI agents need deep business context to provide accurate, grounded responses. Knowledge Catalog bridges the gap between enterprise data governance and AI agent workflows.

What is the difference between Dataplex and Knowledge Catalog

Knowledge Catalog updates reflect new AI-centric capabilities. Unlike conventional passive catalogs, Knowledge Catalog automatically curates metadata, business logic, and data relationships into a unified context graph. This graph provides the reliable enterprise truth that AI agents need to run complex tasks accurately. It leverages features like automatic context curation, verified example queries, and local and remote Model Context Protocol (MCP) integrations.

What is not changing

Your existing Dataplex deployments, APIs, and configurations remain operational. Core features like data discovery, lineage, data quality, and business glossaries are unchanged and supported. Your existing metadata, aspects, and configurations transition to the new Knowledge Catalog experience without any manual migration, data movement, or downtime.

APIs and client libraries

The rebranding to Knowledge Catalog doesn't change existing API endpoints, gcloud dataplex commands, or client libraries. You can continue to use the Knowledge Catalog APIs and client libraries to interact with Knowledge Catalog:

How Knowledge Catalog works

Knowledge Catalog unifies governance and context through three core pillars:

The following diagram illustrates the architecture of Knowledge Catalog and how it unifies data governance with generative AI workflows:

Architecture of Knowledge Catalog showing the curation of metadata, business logic, and data relationships into a unified context graph for AI agents. Architecture of Knowledge Catalog showing the curation of metadata, business logic, and data relationships into a unified context graph for AI agents.

Figure 1. Architecture of Knowledge Catalog (click to enlarge)

Common use cases

Knowledge Catalog helps data engineers, data scientists, and AI developers solve challenges across data management and AI development:

Sample workflows in Knowledge Catalog

To see how you can build your context graph and manage your data estate, consider how an online retail company might use the following Knowledge Catalog features:

Knowledge Catalog in the Google Cloud ecosystem

When building a data foundation, it is important to understand how Knowledge Catalog integrates with related Google Cloud services:

Service Primary role When to use
Knowledge Catalog Agentic context and data governance Use to catalog metadata, manage data quality, and provide semantic grounding for AI agents.
BigQuery Enterprise data warehouse Use to store, query, and analyze massive datasets. Knowledge Catalog enriches BigQuery data with business context.
Vertex AI AI and machine learning platform Use to build and deploy ML models and AI agents. Agents use Knowledge Catalog APIs to retrieve accurate enterprise context.
Cloud Storage Unstructured data storage Use to store raw files. Knowledge Catalog scans Cloud Storage buckets to extract searchable metadata and entities.

Core concepts

To use Knowledge Catalog effectively, understand the following key concepts:

-- Example: An example query retrieved by an AI agent to ensure accurate revenue calculation
SELECT customer_id, SUM(transaction_amount) AS total_revenue
FROM `sales.processed_transactions`
WHERE transaction_status = 'COMPLETED'
GROUP BY customer_id;

Ingestions

Knowledge Catalog automatically ingests metadata from the following Google Cloud sources. For some services, such as AlloyDB for PostgreSQL and Cloud SQL, you must first enable Knowledge Catalog integration before metadata can be ingested:

To import metadata from a third-party source into Knowledge Catalog, you can use a managed connectivity pipeline. For more information, see Managed connectivity overview.

Limitations

When planning your deployment, consider the following limitations:

What's next