data mesh (original) (raw)

What is a data mesh?

Data mesh is a decentralized data management architecture for analytics and data science. Traditional data architectures often centralize data, leading to challenges in scalability, flexibility and governance. Data mesh proposes a decentralized approach where data is treated as a product and managed by decentralized teams or domains within an organization, such as marketing, sales and customer service.

The term was coined by Zhamak Dehghani in 2019 while at the consultancy Thoughtworks to help address some of the fundamental shortcomings in traditional centralized architectures such as data warehouses and data lakes.

Why use a data mesh?

Data mesh is an emerging concept in data architecture that offers several benefits for organizations. Here are some key reasons why organizations might consider using data mesh:

How does data mesh work?

Previously, a centralized infrastructure team would manage data ownership across domains. However, a data mesh model shifts this ownership to the producers as they are the subject matter experts in the field. They can design APIs with the interests of the main data consumers in mind because they have a solid understanding of how they use the operational and analytical data in the domain.

In addition to placing responsibility for cataloging information, establishing policies for usage and permissions and defining semantic definitions, this domain-driven approach also maintains a centralized data governance team to enforce these standards and practices surrounding the data.

Infographic showing why organizations need to govern data.

These are some of the top reasons to have a data governance program.

Core principles of data mesh architecture

Dehghani advocates four core principles that underlie data mesh architecture for data analytics and data science applications.

1. Domain-oriented data ownership and architecture

A data mesh builds on author Eric Evans' theory of domain-driven design that explores how to deconstruct applications into distributed services aligned around business capabilities. Data ownership is distributed among different teams or domains, each responsible for managing their extract, transform and load (ETL) pipelines and sharing data related to their domain expertise.

But instead of thinking only about services, data teams also need to host and serve domain data sets in a way that's easily consumed by others across the organization. Rather than push and ingest data, these teams need to think about how to host data that different users can pull.

The core principle is that data should be the responsibility of the business teams closest to the data. Domain teams should have access to tools that create analytics data, its metadata and all the computations required to serve it.Top of Form

Diagram showing domain-driven design which breaks down into subdomains and microservices.

Data mesh builds on domain-driven design, which breaks down a platform into subdomains, which are mapped to microservices with their own endpoints.

2. Data as a product

The software industry has been transitioning from project management to product management. A data mesh applies the same concept to data products. Domain experts must focus on improving various aspects of these data products, such as data quality, lead time of data consumption and user satisfaction.

Data products must be the following to be successful:

A fundamental principle is that accountability shifts as close to the data source as possible rather than to a data engineering team that may be less familiar with how the data was collected, what it means and how it might be used. Data engineering teams need to focus on setting up the infrastructure that works across business domains so it's easier to create and manage these products through capabilities such as discoverability, explorability, security, trustworthiness and understandability.

A data product is built on several structural components, including the following:

3. Self-service data platform

Business teams aren't data engineers or data scientists, nor should they be. Data engineers must build the appropriate infrastructure to provide these domain experts with domain autonomy. This infrastructure might take advantage of existing data platforms and tools, but it also needs to support self-service provisioning capabilities for data products that are accessible to a broader audience. These users should be able to work with data storage formats, create data product schemas, set up data pipelines, manage data product lineage and automate governance.

One approach is to set up a multiplane data platform analogous to the different planes in network routing. A data infrastructure provisioning plane helps set up the underlying infrastructure. A data product developer experience plane simplifies development workflows with tools to create, read, version, secure and build data products. A data mesh supervision plane helps execute new services across the infrastructure for things including discovering data products or correlating multiple data products together.

4. Federated computational governance

A data mesh needs a decentralized governance model that can automate the execution of decisions across the platform. This model ensures interoperability across the different data sources. It can also help correlate, join and perform other operations across multiple data products at scale.

That differs from traditional data governance approaches for analytics that try to centralize all decision-making. Each domain is responsible for some decisions, such as the domain data model and quality assurance. A centralized data engineering team shifts its focus to automating many aspects of governance, such as setting up tools to detect and recover from errors, automate processes and establish service-level objectives for the enterprise.

Data mesh vs. data lake

The main difference between a data lake and a data mesh lies in their architectural approaches and organizational principles for managing data.

List of four data mesh architectural principles.

Data mesh's four architectural principles bring data accountability closest to the source.

Data lake

Data mesh

Data mesh vs. data fabric

Data mesh and data fabric are both approaches to managing and using data within organizations, but they differ in their architectural focus and execution.

Data fabric

Data mesh

Data mesh design challenges

With its numerous benefits and use cases, data mesh can also present several challenges in its design and execution. Some challenges with data mesh design include the following:

Best practices for data mesh design and execution

Some best practices for designing and executing data meshes include the following:

Explore the decentralized approach of data mesh compared to traditional options such as data warehouses, data lakes and data fabrics. Learn how data mesh increases data access and unlocks greater value from data.

This was last updated in June 2024

Continue Reading About data mesh

Dig Deeper on Data management strategies