BSON Types (original) (raw)
BSON is a binary serialization format used to store documents and make remote procedure calls in MongoDB. The BSON specification is located at bsonspec.org.
Each BSON type has both integer and string identifiers as listed in the following table:
Type | Number | Alias | Notes |
---|---|---|---|
Double | 1 | "double" | |
String | 2 | "string" | |
Object | 3 | "object" | |
Array | 4 | "array" | |
Binary data | 5 | "binData" | |
Undefined | 6 | "undefined" | Deprecated. |
ObjectId | 7 | "objectId" | |
Boolean | 8 | "bool" | |
Date | 9 | "date" | |
Null | 10 | "null" | |
Regular Expression | 11 | "regex" | |
DBPointer | 12 | "dbPointer" | Deprecated. |
JavaScript | 13 | "javascript" | |
Symbol | 14 | "symbol" | Deprecated. |
32-bit integer | 16 | "int" | |
Timestamp | 17 | "timestamp" | |
64-bit integer | 18 | "long" | |
Decimal128 | 19 | "decimal" | |
Min key | -1 | "minKey" | |
Max key | 127 | "maxKey" |
- The $type operator supports using these values to query fields by their BSON type. $type also supports the
number
alias, which matches the integer, decimal, double, and long BSON types. - The $type aggregation operator returns the BSON type of its argument.
- The $isNumber aggregation operator returns
true
if its argument is a BSON integer, decimal, double, or long.
To determine a field's type, see Type Checking.
If you convert BSON to JSON, see the Extended JSON reference.
The following sections describe special considerations for particular BSON types.
A BSON binary binData
value is a byte array. A binData
value has a subtype that indicates how to interpret the binary data. The following table shows the subtypes:
Number | Description |
---|---|
0 | Generic binary subtype |
1 | Function data |
2 | Binary (old) |
3 | UUID (old) |
4 | UUID |
5 | MD5 |
6 | Encrypted BSON value |
7 | Compressed time series data_New in version 5.2_. |
8 | Sensitive data, such as a key or secret. MongoDB does not log literal values for binary data with subtype 8. Instead, MongoDB logs a placeholder value of ###. |
9 | Vector data, which is densely packed arrays of numbers of the same type. |
128 | Custom data |
ObjectIds are small, likely unique, fast to generate, and ordered. ObjectId values are 12 bytes in length, consisting of:
- A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
- A 5-byte random value generated once per client-side process. This random value is unique to the machine and process. If the process restarts or the primary node of the process changes, this value is re-generated.
- A 3-byte incrementing counter per client-side process, initialized to a random value. The counter resets when a process restarts.
For timestamp and counter values, the most significant bytes appear first in the byte sequence (big-endian). This is unlike other BSON values, where the least significant bytes appear first (little-endian).
If an integer value is used to create an ObjectId, the integer replaces the timestamp.
In MongoDB, each document stored in a standard collection requires a unique_id field that acts as a primary key. If an inserted document omits the _id
field, the MongoDB driver automatically generates an ObjectId for the _id
field.
This also applies to documents inserted through update operations with upsert: true.
MongoDB clients should add an _id
field with a unique ObjectId. Using ObjectIds for the _id
field provides the following additional benefits:
- You can access
ObjectId
creation time in mongoshusing the ObjectId.getTimestamp() method. - ObjectIds are approximately ordered by creation time, but are not perfectly ordered. Sorting a collection on an
_id
field containingObjectId
values is roughly equivalent to sorting by creation time.
Important
While ObjectId values should increase over time, they are not necessarily monotonic. This is because they:
- Only contain one second of temporal resolution, so ObjectIdvalues created within the same second do not have a guaranteed ordering, and
- Are generated by clients, which may have differing system clocks.
Use the ObjectId() methods to set and retrieve ObjectId values.
Starting in MongoDB 5.0, mongosh replaces the legacy mongo
shell. The ObjectId()
methods work differently in mongosh
than in the legacy mongo
shell. For more information on the legacy methods, see Legacy mongo Shell.
BSON strings are UTF-8. In general, drivers for each programming language convert from the language's string format to UTF-8 when serializing and deserializing BSON. This makes it possible to store most international characters in BSON strings with ease.[1] In addition, MongoDB$regex queries support UTF-8 in the regex string.
BSON has a special timestamp type for internal MongoDB use and isnot associated with the regular Datetype. This internal timestamp type is a 64 bit value where:
- the most significant 32 bits are a
time_t
value (seconds since the Unix epoch) - the least significant 32 bits are an incrementing
ordinal
for operations within a given second.
While the BSON format is little-endian, and therefore stores the least significant bits first, the mongod instance always compares the time_t
value before the ordinal
value on all platforms, regardless of endianness.
In replication, the oplog has a ts
field. The values in this field reflect the operation time, which uses a BSON timestamp value.
Within a single mongod instance, timestamp values in theoplog are always unique.
Note
The BSON timestamp type is for internal MongoDB use. For most cases, in application development, you will want to use the BSON date type. See Date for more information.
BSON Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This results in a representable date range of about 290 million years into the past and future.
The official BSON specificationrefers to the BSON Date type as the UTC datetime.
BSON Date type is signed. [2] Negative values represent dates before 1970.
To construct a Date
in mongosh, you can use thenew Date()
or ISODate()
constructor.
To construct a Date
with the new Date()
constructor, run the following command:
The mydate1
variable outputs a date and time wrapped as an ISODate:
ISODate("2020-05-11T20:14:14.796Z")
To construct a Date
using the ISODate()
constructor, run the following command:
The mydate2
variable stores a date and time wrapped as an ISODate:
ISODate("2020-05-11T20:14:14.796Z")
To print the Date
in a string
format, use the toString()
method:
Mon May 11 2020 13:14:14 GMT-0700 (Pacific Daylight Time)
You can also return the month portion of the Date
value. Months are zero-indexed, so that January is month 0
.
decimal128
is a 128-bit decimal representation for storing very large or very precise numbers, whenever rounding decimals is important. It was created in August 2009 as part of theIEEE 754-2008revision of floating points. When you need high precision when working with BSON data types, you should use decimal128
.
decimal128
supports 34 decimal digits of precision, orsignificand along with an exponent range of -6143 to +6144. The significand is not normalized in the decimal128
standard, allowing for multiple possible representations:10 x 10^-1 = 1 x 10^0 = .1 x 10^1 = .01 x 10^2
, etc. Having the ability to store maximum and minimum values in the order of 10^6144
and 10^-6143
, respectively, allows for a lot of precision.
In MongoDB, you can store data in decimal128
format using theNumberDecimal()
constructor. If you pass in the decimal value as a string, MongoDB stores the value in the database as follows:
NumberDecimal("9823.1297")
You can also pass in the decimal value as a double
:
NumberDecimal(1234.99999999999)
You should also consider the usage and support your programming language has for decimal128
. The following languages don’t natively support this feature and require a plugin or additional package to get the functionality:
- Python: The decimal.Decimalmodule can be used for floating-point arithmetic.
- Java: The Java BigDecimalclass provides support for
decimal128
numbers. - Node.js: There are several packages that provide support, such as js-big-decimalor node.js bigdecimalavailable on npm.
When you perfom mathematical calculations programmatically, you can sometimes receive unexpected results. The following example in Node.js yields incorrect results:
> 0.1
0.1
> 0.2
0.2
> 0.1 * 0.2
0.020000000000000004
> 0.1 + 0.1
0.010000000000000002
Similarly, the following example in Java produces incorrect output:
1
class Main {
2
public static void main(String[] args) {
3
System.out.println("0.1 * 0.2:");
4
System.out.println(0.1 * 0.2);
5
}
6
}
1
0.1 * 0.2:
2
0.020000000000000004
The same computations in Python, Ruby, Rust, and other languages produce the same results. This happens because binary floating-point numbers do not represent base 10 values well.
For example, the 0.1
used in the above examples is represented in binary as 0.0001100110011001101
. Most of the time, this does not cause any significant issues. However, in applications such as finance or banking where precision is important, use decimal128
as your data type.