Clean Database Design: Strategies to Increase Data Agility (original) (raw)

by Scott W. Ambler

The goal of this article is to describe important database design heuristics that enable greater agility in the use and evolution of a database. The goal is not to describe the process of agile design, but instead strategies to apply while designing. This article explores the following topics:

  1. What is clean database design?
  2. Why clean database design?
  3. Data normalization
  4. Clean database design is fit-for-purpose
  5. Clean database designs are technically future proofed
  6. Clean database design in context
  7. Related resources

1. What is Clean Database Design?

Clean database design is the application of proven heuristics that lead to high quality. In clean database design data entities (such as tables) are loosely coupled and appropriately cohesive. Loosely coupled means that there is the minimal number of relationships from one data entity to others. Appropriately cohesive means that the right data is captured by a data entity for the desired level of normalization. Normalization refers to the level of redundancy within a data entity (more on this below).

2. Why Clean Database Design?

A cleanly designed data source is easy to understand, test, and evolve. This in turn enables agility.

3. Data Normalization

A critical consideration in clean database design is normalization. There are many ways to describe data normalization, from straightforward to highly mathematical. Naturally I lean towards straightforward strategies. In a nutshell, here’s what you need to aim for:

Figure 1. A denormalized schema for an order – Optimized for reporting (UML Notation). Click to enlarge.

Figure 2. A normalized schema for an order – Optimized for editing (UML Notation). Click to enlarge.

4. Clean Database Design is Fit-for-Purpose

Clean database design requires a contextualized, fit-for-purpose approach. By that I mean that the primary purpose of your database should drive its overall design strategy. There are two categories of database to consider:

  1. Online analytical processing (OLAP). Reporting databases, such as data warehouses, typically reqire a denormalized design to be performant. Figure 1 depicts a table from such a database, in this case for orders.
  2. Online transaction processing (OLTP). Transactional databases typically require a highly normalized design to be accurate. This means that the data for a single entity, in this case an order, is stored in multiple tables and then recombined as needed. In Figure 2 entity types such as order, item, and contact are stored in individual tables. This makes it easier to update the information pertinent to each entity type.

5. Clean Database Designs are Technically Future Proofed

“Future proofing” means that you architect or design something to stand the test of time. In the case of a database, you want to design it to meet your future requirements. Unfortunately, you don’t know how the business requirements will evolve in the future, and trying to predict them tends to result in big design up front (BDUF) which results in solutions that are difficult (slow and expensive) to evolve. So don’t do that. Instead, focus on addressing common change cases and common technical challenges that aren’t specific to your business domain. These common change cases are:

There are several strategies for future proofing your database design to ensure you address the common change cases that you (will) face:

Figure 3. A future-proofed data schema (UML Notation). Click to enlarge.

6. Clean Database Design in Context

As you see in Figure 4 clean architecture and database design enables you to thinly slice what you deliver to your stakeholders into small increments. Clean database design is evolved via agile data modeling. Your database design is kept clean, or is cleaned up, via database refactoring.

Figure 4. The agile database techniques stack (click to enlarge).

The following table summarizes the trade-offs associated with clean database design and provides advice for when to adopt it.

Table 1. Clean database design in context.

Advantages Easier to understand and evolve Easier to test Increases ability to evolve data hosting strategy
Disadvantages Requires investment to keep clean, including in agile design modeling and database refactoring Existing legacy designs often have significant technical debt that needs to be addressed before your design is sufficiently clean
When to Adopt This Practice My knee-jerk reaction is to say always, but that wouldn’t be accurate. Sometimes time is of the essence and it makes sense to accept technical debt now and decide to pay it down in the future. Hopefully that is rare decision that when it is made is a prudent and deliberate one.