Clustered Collections (original) (raw)

New in version 5.3.

Clustered collections store indexed documents in the sameWiredTiger file as the index specification. Storing the collection's documents and index in the same file provides benefits for storage and performance compared to regular indexes.

Clustered collections are created with a clustered index. The clustered index specifies the order in which documents are stored.

To create a clustered collection, seeExamples.

Important

Backward-Incompatible Feature

You must drop clustered collections before you can downgrade to a version of MongoDB earlier than 5.3.

Clustered collections have the following benefits compared to non-clustered collections:

Clustered collections store documents ordered by the clustered index key value. The clustered index key must be { _id: 1 }.

You can only have one clustered index in a collection because the documents can be stored in only one order. Only collections with a clustered index store the data in sorted order.

You can have a clustered index and add secondary indexes to a clustered collection. Clustered indexes differ from secondary indexes:

Starting in MongoDB 6.0.7, if a usable clustered index exists, the MongoDB query planner evaluates the clustered index against secondary indexes in the query planning process. When a query uses a clustered index, MongoDB performs a bounded collection scan.

Prior to MongoDB 6.0.7, if a secondary indexexisted on a clustered collection and the secondary index was usable by your query, the query planner selected the secondary index instead of the clustered index by default. In MongoDB 6.1 and prior, to use the clustered index, you must provide a hint because the query optimizer does not automatically select the clustered index.

Clustered collection limitations:

By default, the clustered index key values are the unique documentobject identifiers.

You can set your own clustered index key values. Your key values must follow the standard constraints of the _id field.

Additionally, use the following practices to optimize performance:

Warning

Randomly generated key values may decrease a clustered collection's performance.

This section shows clustered collection examples.

The following create example adds a clustered collection named products:


db.runCommand( {

   create: "products",

   clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "products clustered key" }

} )

In the example, clusteredIndexspecifies:

The following db.createCollection() example adds aclustered collection named stocks:


db.createCollection(

   "stocks",

   { clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "stocks clustered key" } }

)

In the example, clusteredIndex specifies:

The following create example adds a clustered collection named orders:


db.createCollection(

   "orders",

   { clusteredIndex: { "key": { _id: 1 }, "unique": true, "name": "orders clustered key" } }

)

In the example, clusteredIndex specifies:

The following example adds documents to the orders collection:


db.orders.insertMany( [

   { _id: ISODate( "2022-03-18T12:45:20Z" ), "quantity": 50, "totalOrderPrice": 500 },

   { _id: ISODate( "2022-03-18T12:47:00Z" ), "quantity": 5, "totalOrderPrice": 50 },

   { _id: ISODate( "2022-03-18T12:50:00Z" ), "quantity": 1, "totalOrderPrice": 10 }

] )

The _id clusteredIndex key stores the order date.

If you use the _id field in a range query, performance is improved. For example, the following query uses _id and $gt to return the orders where the order date is greater than the supplied date:


db.orders.find( { _id: { $gt: ISODate( "2022-03-18T12:47:00.000Z" ) } } )

Example output:


[

   {

      _id: ISODate( "2022-03-18T12:50:00.000Z" ),

      quantity: 1,

      totalOrderPrice: 10

   }

]

To determine if a collection is clustered, use thelistCollections command:


db.runCommand( { listCollections: 1 } )

For clustered collections, you will see the clusteredIndex details in the output. For example, the following output shows the details for the orders clustered collection:


...

name: 'orders',

type: 'collection',

options: {

   clusteredIndex: {

      v: 2,

      key: { _id: 1 },

      name: 'orders clustered key',

      unique: true

   }

},

...

v is the index version.