ProtoJSON Format (original) (raw)

Describes the spec details of the canonical JSON representation for Protobuf messages.

Protobuf supports a canonical encoding in JSON, making it easier to share data with systems that do not support the standard protobuf binary wire format.

This page specifies the format, but a number of additional edge cases which define a conformant ProtoJSON parser are covered in the Protobuf Conformance Test Suite and are not exhaustively detailed here.

Non-goals of the Format

Cannot Represent Some JSON schemas

The ProtoJSON format is designed to be a JSON representation of schemas which are expressible in the Protobuf schema language.

It may be possible to represent many pre-existing JSON schemas as a Protobuf schema and parse it using ProtoJSON, but it is not designed to be able to represent arbitrary JSON schemas.

For example, there is no way to express in Protobuf schema to write types that may be common in JSON schemas like number[][] or number|string.

It is possible to use google.protobuf.Struct and google.protobuf.Value types to allow arbitrary JSON to be parsed into a Protobuf schema, but these only allow you to capture the values as schemaless unordered key-value maps.

Not as efficient as the binary wire format

ProtoJSON Format is not as efficient as binary wire format and never will be.

The converter uses more CPU to encode and decode messages and (except in rare cases) encoded messages consume more space.

Does not have as good schema-evolution guarantees as binary wire format

ProtoJSON format does not support unknown fields, and it puts field and enum value names into encoded messages which makes it much harder to change those names later. Removing fields is a breaking change that will trigger a parsing error.

See JSON Wire Safety below for more details.

Format Description

Representation of each type

The following table shows how data is represented in JSON files.

Protobuf type JSON JSON example Notes
message object {"fooBar": v, "g": null, ...} Generates JSON objects.Keys are serialized as lowerCamelCase of field name. SeeField Names for more special cases regarding mapping of field names to object keys.Well-known types have special representations, as described in the Well-known types table.null is valid for any field and leaves the field unset. See Null Values for clarification about the semantic behavior of null values.
enum string "FOO_BAR" The name of the enum value as specified in proto is used. Parsers accept both enum names and integer values.
map<K,V> object {"k": v, ...} All keys are converted to strings (object keys in JSON can only be strings).
repeated V array [v, ...]
bool true, false true, false
string string "Hello World!"
bytes base64 string "YWJjMTIzIT8kKiYoKSctPUB+" JSON value will be the data encoded as a string using standard base64 encoding with paddings. Either standard or URL-safe base64 encoding with/without paddings are accepted.
int32, fixed32, uint32 number 1, -10, 0 JSON value will be a number. Either numbers or strings are accepted. Empty strings are invalid. Exponent notation (such as 1e2) is accepted in both quoted and unquoted forms.
int64, fixed64, uint64 string "1", "-10" JSON value will be a decimal string. Either numbers or strings are accepted. Empty strings are invalid. Exponent notation (such as 1e2) is accepted in both quoted and unquoted forms. See Strings for int64sfor the explanation why strings are used for int64s.
float, double number 1.1, -10.0, 0, "NaN", "Infinity" JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Empty strings are invalid. Exponent notation is also accepted.

Well-Known Types

Some messages in the google.protobuf package have a special representation when represented in JSON.

No message type outside of the google.protobuf package has a special ProtoJSON handling; for example, types in google.types package are represented with the neutral representation.

Message type JSON JSON example Notes
Any object {"@type": "url", "f": v, ... } See Any
Timestamp string "1972-01-01T10:00:20.021Z" Uses RFC 3339 (see clarification). Generated output will always be Z-normalized with 0, 3, 6 or 9 fractional digits. Offsets other than "Z" are also accepted.
Duration string "1.000340012s", "1s" Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision, followed by the suffix "s". Accepted are any fractional digits (also none) as long as they fit into nanoseconds precision and the suffix "s" is required. This is not RFC 3339 'duration' format (see Durations for clarification).
Struct object { ... } Any JSON object. See struct.proto.
Wrapper types various types 2, "2", "foo", true, "true", null, 0, ... Wrappers use the same representation in JSON as the wrapped primitive type, except that null is allowed and preserved during data conversion and transfer.
FieldMask string "f.fooBar,h" See field_mask.proto.
ListValue array [foo, bar, ...]
Value value Any JSON value. Checkgoogle.protobuf.Valuefor details.
NullValue null JSON null. Special case of the null parsing behavior.
Empty object {} (not special cased) An empty JSON object

Field names as JSON keys

Message field names are mapped to lowerCamelCase to be used as JSON object keys. If the json_name field option is specified, the specified value will be used as the key instead.

Parsers accept both the lowerCamelCase name (or the one specified by thejson_name option) and the original proto field name. This allows for a serializer option to choose to print using the original field name (seeJSON Options) and have the resulting output still be parsed back by all spec parsers.

\0 (nul) is not allowed within a json_name value. For more on why, seeStricter validation for json_name. Note that \0 is still considered a legal character within the value of a string field.

Presence and default-values

When generating JSON-encoded output from a protocol buffer, if a field supports presence, serializers must emit the field value if and only if the corresponding hazzer would return true.

If the field doesn’t support field presence and has the default value (for example any empty repeated field) serializers should omit it from the output. An implementation may provide options to include fields with default values in the output.

Null values

Serializers should not emit null values.

Parsers accept null as a legal value for any field, with the following behavior:

The implication of this is that a null value for an implicit presence field will behave the identically to the behavior to the default value of that field, since there are no hazzers for those fields. For example, a value of null or[] for a repeated field will cause key-validation checks, but both will otherwise behave the same as if the field was not present in the JSON at all.

null values are not allowed within repeated fields.

google.protobuf.NullValue is a special exception to this behavior: null is handled as a sentinel-present value for this type, and so a field of this type must be handled by serializers and parsers under the standard presence behavior. This behavior correspondingly allows google.protobuf.Struct andgoogle.protobuf.Value to losslessly round trip arbitrary JSON.

Duplicate values

Serializers must never serialize the same field multiple times, nor multiple different cases in the same oneof in the same JSON object.

Parsers should accept the same field being duplicated, and the last value provided should be retained. This also applies to “alternate spellings” of the same field name.

If implementations cannot maintain the necessary information about field order it is preferred to reject inputs with duplicate keys rather than have an arbitrary value win. In some implementations maintaining field order of objects may be impractical or infeasible, so it is strongly recommended that systems avoid relying on specific behavior for duplicate fields in ProtoJSON where possible.

Out of range numeric values

When parsing a numeric value, if the number that is is parsed from the wire doesn’t fit in the corresponding type, the parser should fail to parse.

This includes any negative number for uint32, and numbers less than INT_MINor larger than INT_MAX for int32.

Values with nonzero fractional portions are not allowed for integer-typed fields. Zero fractional portions are accepted. For example 1.0 is valid for an int32 field, but 1.5 is not.

Strings for int64

Unfortunately, the json.org spec does not speak to the intended precision limits of numbers. Many implementations follow the original JS behavior that JSON was derived from and interpret all numbers as binary64 (double precision) and are silently lossy if a number is an integer larger than 2**53. Other implementations may support unlimited precision bigints, int64s, or even bigfloats with unlimited fractional precision.

This creates a situation where if the JSON contains a number that is not exactly representable by double precision, different parsers will behave differently, including silent precision loss in many languages.

To avoid these problems, ProtoJSON serializers emit int64s as strings to ensure no precision loss will occur on large int64s by any implementation.

When parsing a bare number when expecting an int64, the implementation should coerce that value to double-precision even if the corresponding language’s built-in JSON parser supports parsing of JSON numbers as bigints. This ensures a consistent interpretation of the same data, regardless of language used.

This design follows established best practices in how to handle large numbers in JSON when prioritizing interoperability, including:

Any

Normal messages

For any message that is not a well-known type with a special JSON representation, the message contained inside the Any is turned into a JSON object with an additional "@type" field inserted that contains the type_urlthat was set on the Any.

For example, if you have this message definition:

package x;
message Child { int32 x = 1; string y = 2; }

When an instance of Child is packed into an Any, the JSON representation is:

{
  "@type": "type.googleapis.com/x.Child",
  "x": 1,
  "y": "hello world"
}

Special-cased well-known types

If the Any contains a well-known type that has a special JSON mapping, the message is converted into the special representation and set as a field with key “value”.

For example, a google.protobuf.Duration that represents 3.1 seconds will be represented by the string "3.1s" in the special case handling. When thatDuration is packed into an Any it will be serialized as:

{
  "@type": "type.googleapis.com/google.protobuf.Duration",
  "value": "3.1s"
}

Message types with special JSON encodings include:

Note that google.protobuf.Empty is not considered to have any special JSON mapping; it is simply a normal message that has zero fields. This means the expected representation of an Empty packed into an Any is {"@type": "type.googleapis.com/google.protobuf.Empty"} and not {"@type": "type.googleapis.com/google.protobuf.Empty", "value": {}}.

ProtoJSON Wire Safety

When using ProtoJSON, only some schema changes are safe to make in a distributed system. This contrasts with the same concepts applied to thethe binary wire format.

JSON Wire-unsafe Changes

Wire-unsafe changes are schema changes that will break if you parse data that was serialized using the old schema with a parser that is using the new schema (or vice versa). You should almost never do this shape of schema change.

JSON Wire-safe Changes

Wire-safe changes are ones where it is fully safe to evolve the schema in this way without risk of data loss or new parse failures.

Note that nearly all wire-safe changes may be a breaking change to application code. For example, adding a value to a preexisting enum would be a compilation break for any code with an exhaustive switch on that enum. For that reason, Google may avoid making some of these types of changes on public messages. The AIPs contain guidance for which of these changes are safe to make there.

JSON Wire-compatible Changes (Conditionally safe)

Unlike wire-safe changes, wire-compatible means that the same data can be parsed both before and after a given change. However, a client that reads it will get lossy data under this shape of change. For example, changing an int32 to an int64 is a compatible change, but if a value larger than INT32_MAX is written, a client that reads it as an int32 will discard the high order bits.

You can make compatible changes to your schema only if you manage the roll out to your system carefully. For example, you may change an int32 to an int64 but ensure you continue to only write legal int32 values until the new schema is deployed to all endpoints, and then start writing larger values after that.

Compatible But With Unknown Field Handling Problems

Unlike the binary wire format, ProtoJSON implementations generally do not propagate unknown fields. This means that adding to schemas is generally compatible but will result in parse failures if a client using the old schema observes the new content.

This means you can add to your schema, but you cannot safely start writing them until you know the schema has been deployed to the relevant client or server (or that the relevant clients set an Ignore Unknown Fields flag, discussedbelow).

Compatible But Potentially Lossy

RFC 3339 Clarifications

Timestamps

ProtoJSON timestamps use the RFC 3339 timestamp format. Unfortunately, some ambiguity in the RFC 3339 spec has created a few edge cases where various other RFC 3339 implementations do not agree on whether or not the format is legal.

RFC 3339 intends to declare a strict subset of ISO-8601 format, and some additional ambiguity was created since RFC 3339 was published in 2002 and then ISO-8601 was subsequently revised without any corresponding revisions of RFC 3339.

Most notably, ISO-8601-1988 contains this note:

In date and time representations lower case characters may be used when upper case characters are not available.

It is ambiguous whether this note is suggesting that parsers should accept lowercase letters in general, or if it is only suggesting that lowercase letters may be used as a substitute in environments where uppercase cannot be technically used. RFC 3339 contains a note that intends to clarify the interpretation to be that lowercase letters should be accepted in general.

ISO-8601-2019 does not contain the corresponding note and is unambiguous that lowercase letters are not allowed.

This created some confusion for all libraries that declare they support RFC 3339: today RFC 3339 declares it is a profile of ISO-8601 but contains a clarifying note referencing text that is not present in the latest ISO-8601 spec.

ProtoJSON spec takes the decision that the timestamp format is the stricter definition of “RFC 3339 as a profile of ISO-8601-2019”. Some Protobuf implementations may be non-conformant by using a timestamp parsing implementation that is implemented as “RFC 3339 as a profile of ISO-8601-1988,” which will accept a few additional edge cases.

For consistent interoperability, parsers should only accept the stricter subset format where possible. When using a non-conformant implementation that accepts the laxer definition, strongly avoid relying on the additional edge cases being accepted.

Durations

RFC 3339 also defines a duration format, but unfortunately the RFC 3339 duration format does not have any way to express sub-second resolution.

The ProtoJSON duration encoding is directly inspired by RFC 3339 dur-secondsrepresentation, but it is able to encode nanosecond precision. For integer number of seconds the two representations may match (like 10s), but the ProtoJSON durations accept fractional values and conformant implementations must precisely represent nanosecond precision (like 10.500000001s).

JSON Options

A conformant protobuf JSON implementation may provide the following options: