DBT Labs updates Semantic Layer, adds data mesh enablement (original) (raw)
DBT Labs unveiled a set of new and improved capabilities aimed at helping customers transform data to prepare it for analysis, including an updated version of its Semantic Layer, a new governance tool designed for data mesh architectures and an integration with Tableau.
The vendor introduced the new and improved capabilities on Oct. 17 during Coalesce 2023, a user conference in San Diego.
Based in Philadelphia, DBT Labs -- which stands for "data build tool" -- was founded in 2016 as an open source set of tools to help engineers transform data. DBT Labs still offers an open source version of its platform that's free, but it is also now a for-profit vendor with a Team version that costs $100 per developer seat, per month, and an Enterprise version that's customized based on an organization's needs.
Over the past year, DBT Labs has formed numerous partnerships to expose its platform to potential new users as well as to make integrations with data management and analytics platforms as easy as possible for its existing customers.
Included are partnerships with Alation, Starburst and ThoughtSpot. Now, analytics vendor Tableau is also a partner.
In addition, DBT Labs acquired Transform in February to improve the capabilities of its Semantic Layer tool. The updated version of Semantic Layer -- now generally available -- reflects those improvements resulting from the acquisition.
Semantics
As organizations collect more and more data from an ever-growing number of sources, and as that data gets ingested by different departments within those organizations, semantic layers are becoming critical.
Semantic layers are tools that enable organizations to create common definitions for data and key metrics, no matter which department collects and consumes the data or uses the metric to measure progress.
Without semantic layers, different departments such as finance and marketing might collect the same data -- for example, a point-of-sale transaction -- but the finance and marketing departments might put different labels on that same transaction. If there's never a need for that data beyond the finance department's need to calculate revenue or the marketing department's need to track sales in a given region, then defining the same data in different ways is no problem.
But, for example, should that transaction be needed to create a complete customer profile, it's vital that the data be defined the same way across departments. Without the common definition, the data appears to be different, and amid perhaps millions of other data points, it never gets properly identified and joined.
DBT Labs disrupted data engineering technologies in the modern data stack, and with its new semantic layer being defined by metrics -- which is a departure from the semantic layers of the past -- DBT Labs is looking to further disrupt analytics in the modern data stack as well.
Stewart BondAnalyst, IDC
The same is true for KPIs. If departments define "revenue" differently or use different data to inform metrics that might have the same title, those metrics won't translate across the organization and can't be used together to inform decisions.
Semantic layers, however, have mostly been provided by analytics vendors rather than other vendors within the data stack, according to Stewart Bond, an analyst at IDC. In addition, most of those semantic layers have focused solely on defining data and not defining metrics, and they have been largely for on-premises environments, he continued.
Among analytics vendors, Looker and MicroStrategy are among those that provide their own semantic layer.
As more organizations migrate their data to the cloud, there's now a need for semantic layers to be independent of analytics tools. DBT Labs is not only serving that need, but also expanding beyond data engineering with Semantic Layer, Bond said.
"DBT Labs disrupted data engineering technologies in the modern data stack, and with its new semantic layer being defined by metrics -- which is a departure from the semantic layers of the past -- DBT Labs is looking to further disrupt analytics in the modern data stack as well," he said.
Mike Leone, an analyst at TechTarget's Enterprise Strategy Group, likewise noted that Semantic Layer was a significant addition for DBT Labs when first introduced and that added capabilities only make it more so.
"This enables DBT Labs to serve as the trusted foundation for all data initiatives," he said. "Organizations want consistency in their decision-making. By enabling organizations to centrally define their business metrics and then consume them across business units ... means the organizations can gain confidence that everyone is operating with the same information."
DBT Labs, which specializes in enabling users to prepare and orchestrate data, first unveiled Semantic Layer in late 2022. From its inception, the tool enabled users to define data and metrics in DBT Cloud and then run queries from any integrated analytics platform, such as ThoughtSpot and now Tableau.
Now, with the integration of Transform's capabilities, the tool also enables the following:
- Support for dynamic joins so that users can combine an unlimited number of tables to develop metrics on top of an existing database.
- Automated generation of joins, filters and aggregations using SQL code.
- Support for new and complex metrics to better enable organizations to measure success.
- An integration with Tableau so that Tableau users can develop consistent metrics.
- Improved connectivity with Amazon Redshift, Databricks, Google BigQuery and Snowflake.
DBT Labs' impetus for improving Semantic Layer came largely from the vendor's users, according to Luis Maldonado, DBT Labs' vice president of product.
He noted that the vendor has a large network of users, given its start as fully open source and continued ties to the open source community. In particular, joins between tables was a feature users wanted in addition to the initial Semantic Layer capabilities.
"We put the first version out, and there was a lot of feedback," Maldonado said. "Our customers made it clear that in order to make things work, many of their metrics were going to need to be joined across tables, and [Semantic Layer] had to support that. The big star [of the update] is those dynamic joins."
In fact, feedback from users played a role in DBT Labs' acquisition of Transform, he continued. Rather than internally develop many of the new capabilities, DBT Labs was instead able to add them through the acquisition and subsequent integration of Transform's capabilities.
Meshing together
Just as organizations can significantly benefit from semantic modeling as their data volume grows, they can likewise benefit from a decentralized architecture.
Even before data volume exploded over the past few years, centralized IT teams were often overwhelmed by the myriad tasks of managing a large enterprise's data. Even self-service analytics couldn't eliminate bottlenecks and free data engineers and other data workers enough to prevent lengthy delays between the time data is ingested and the time it can be used to inform analysis.
However, a decentralized data architecture eases that burden. Rather than make data the domain of a single team within an organization, many organizations are now adopting a data mesh approach in which each domain -- or department -- is in charge of its own data.
Departmental data stewards are tasked with oversight, while tools such as data catalogs connect the different departments to enable data sharing and collaboration across domains.
To help organizations as they decentralize, DBT Labs revealed the public preview of DBT Mesh as part of the vendor's latest version of DBT Cloud, a managed service with which developers can build and deploy data products.
DBT Mesh enables domain teams to develop and service their own data products and includes governance capabilities such as access and version control.
In doing those things, it represents expansion for DBT Labs, according to Bond. The vendor has historically focused solely on enabling users to build data pipelines.
"DBT Mesh is an interesting development that takes DBT from being a data pipeline build tool to being used for the creation of data products, which could also be a combination of several pipelines, models and data transformation products," Bond said.
Initially, many DBT Labs users were developers working independently of their organizations' data stacks, using the open source tool to experiment beyond what they could do with legacy tools, he continued. As a result, their work sometimes didn't meet corporate guidelines and standards that had to be added in after data products were developed.
Making governance part of DBT Mesh is therefore important, according to Bond.
"As the number of DBT projects and data products increases, IT organizations and chief data officers will be looking for governance," he said.
In addition to DBT Mesh, the DBT Cloud update includes the following:
- DBT Explorer, a tool that enables users to track and view data lineage, including across domains in a data mesh architecture, to better understand how data is being consumed and ensure data quality.
- Cloud CLI, a feature that lets developers write code from not only their integrated development environment, but also a command-line interface so that they can work from different devices as well as use the IDE software of their choice.
- Adapters for Microsoft Azure Synapse and Microsoft Fabric that enable Azure Synapse and Fabric users to access DBT Labs' data transformation capabilities.
DBT Explorer displays a data set's lineage.
Taken in concert with DBT Mesh and Semantic Layer, the new capabilities represent significant evolution for DBT Labs, according to Bond.
"With these new features, DBT Labs is moving their products into a platform, which will help DBT Labs ... establish a broader footprint across departments within organizations and enterprises, including being more acceptable to IT and data governance," he said.
Maldonado called DBT Mesh the highlight of the DBT Cloud update.
Leone, however, singled out the integrations as important to DBT Labs' growth beyond its open source beginnings.
"I think the continued focus on integrations remains critical to enabling a broader set of customers to embrace the value of DBT Labs," he said. "Enterprises don't rely on a single analytics tool -- they use several. While DBT Labs doesn't care what tool is being used, they know the business needs help ensuring that whatever tool it is, the underlying data is consistent and trusted across all of them."
Looking ahead
DBT Mesh is just the beginning of DBT Labs' focus on data mesh, according to Maldonado.
During his time at AWS, where he most recently served as head of product for Amazon Athena -- he joined DBT Labs in July -- Maldonado saw data mesh adoption growing as enterprises' centralized data operations stalled under the weight of demand.
"[Data mesh] is the only way they can move forward and not have IT crumble," he said.
As a result, data mesh will be a prominent part of DBT Labs' roadmap.
In particular, DBT Labs plans to better enable organizations to coordinate and orchestrate workflows across projects and domains. Also, more governance capabilities are in the works.
Beyond data mesh, Maldonado said DBT Labs plans to delve more into data observability. Because of its positioning in the data stack between data storage and analysis, it plays a part in much of the analytics pipeline. Using information gathered as data moves from one stage to another, DBT Labs plans to do more to help organizations understand the health and freshness of their data.
"All the data observability we have can be very useful," Maldonado said.
Bond, meanwhile, suggested that the most important thing for DBT Labs going forward is proving the necessity of its new features to draw customers who might already use its engineering tools away from data platform vendors that have long offered data management and governance capabilities.
"DBT Labs will need to demonstrate the value of platform features to win over data engineers that first turned to DBT as an open alternative for data engineering, avoiding larger, proprietary and complex platforms," he said.
Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.