Data Modeling (original) (raw)

Data modeling refers to the organization of data within a database and the links between related entities. Data in MongoDB has aflexible schema model, which means:

Generally, documents in a collection share a similar structure. To ensure consistency in your data model, you can create schema validation rules.

The flexible data model lets you organize your data to match your application's needs. MongoDB is a document database, meaning you can embed related data in object and array fields.

A flexible schema is useful in the following scenarios:

When you design a schema for a document database like MongoDB, there are a couple of important differences from relational databases to consider.

Relational Database Behavior Document Database Behavior
You must determine a table's schema before you insert data. Your schema can change over time as the needs of your application change.
You often need to join data from several different tables to return the data needed by your application. The flexible data model lets you store data to match the way your application returns data, and avoid joins. Avoiding joins across multiple collections improves performance and reduces your deployment's workload.

To ensure that your data model has a logical structure and achieves optimal performance, plan your schema prior to using your database at a production scale. To determine your data model, use the followingschema design process:

  1. Identify your application's workload.
  2. Map relationships between objects in your collections.
  3. Apply design patterns.

When you design your data model in MongoDB, consider the structure of your documents and the ways your application uses data from related entities.

To link related data, you can either:

Embedded documents store related data in a single document structure. A document can contain arrays and sub-documents with related data. Thesedenormalized data models allow applications to retrieve related data in a single database operation.

Data model with embedded fields that contain all related information.

For many use cases in MongoDB, the denormalized data model is optimal.

To learn about the strengths and weaknesses of embedding documents, seeEmbedded Data Models.

References store relationships between data by including links, calledreferences, from one document to another. For example, acustomerId field in an orders collection indicates a reference to a document in a customers collection.

Applications can resolve these references to access the related data. Broadly, these are normalized data models.

Data model using references to link documents. Both the ``contact`` document and the ``access`` document contain a reference to the ``user`` document.

To learn about the strengths and weaknesses of using references, seeReferences.

The following factors can impact how you plan your data model.

When you embed related data in a single document, you may duplicate data between two collections. Duplicating data lets your application query related information about multiple entities in a single query while logically separating entities in your model.

For example, a products collection stores the five most recent reviews in a product document. Those reviews are also stored in areviews collection, which contains all product reviews. When a new review is written, the following writes occur:

If the duplicated data is not updated often, then there is minimal additional work required to keep the two collections consistent. However, if the duplicated data is updated often, using areference to link related data may be a better approach.

Before you duplicate data, consider the following factors:

To learn more, see Handle Duplicate Data.

To improve performance for queries that your application runs frequently, create indexes on commonly queried fields. As your application grows, monitor your deployment's index use to ensure that your indexes are still supporting relevant queries.

When you design your schema, consider your deployment's hardware, especially the amount of available RAM. Larger documents use more RAM, which may cause your application to read from disk and degrade performance. When possible, design your schema so only relevant fields are returned by queries. This practice ensures that your application'sworking set does not grow unnecessarily large.

In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document. This means that if an update operation affects several sub-documents, either all of those sub-documents are updated, or the operation fails entirely and no updates occur.

A denormalized data model with embedded data combines all related data in a single document instead of normalizing across multiple documents and collections. This data model allows atomic operations, in contrast to a normalized model where operations affect multiple documents.

For more information see Atomicity.