MongoDB $substrCP Operator (original) (raw)

Last Updated : 16 Apr, 2026

The $substrCP operator in MongoDB extracts substrings based on Unicode code points within the aggregation pipeline, ensuring correct handling of both ASCII and non-ASCII characters for multilingual text processing.

Syntax

{ $substrCP: [ , , ] }

Importance of $substrCP

Here are some importance discussed below:

Examples of MongoDB $substrCP Operator

To understand MongoDB $substrCP Operator we need a collection on which we will perform various operations and queries.

Screenshot-2026-02-14-161330

Example 1: Using $substrCP operator

Extract publicationmonth and publicationyear from publishedon.

db.articles.aggregate([
{
$project: {
articlename: 1,
publicationmonth: { substrCP:["substrCP: ["substrCP:["publishedon", 0, 4] },
publicationyear: { substrCP:["substrCP: ["substrCP:["publishedon", 4, 4] }
}
}
])

**Output:

Screenshot-2026-02-14-161438

Example 2: Single-Byte Character Set

Create a new field shortName with only the first 10 characters of each article's name. This is useful for displaying short previews of article titles.

db.articles.aggregate([
{
$project: {
articlename: 1,
shortName: {
substrCP:["substrCP: ["substrCP:["articlename", 0, 10]
}
}
}
])

**Output:

Screenshot-2026-02-14-161836

Example 3: Handling Multibyte Character Set

Suppose another document in the articles collection has an articlename in a Multibyte Character Set.

db.articles.aggregate([
{
$project: {
shortName: { substrCP:["substrCP: ["substrCP:["articlename", 0, 15] }
}
}
])

**Output:

Screenshot-2026-02-14-162132

$substrCP ensures that characters are correctly extracted even if they are multibyte characters, preventing data corruption.

Important Points About MongoDB $substrCP Operator

Here are some important points: