Language Guide (proto 3) (original) (raw)

Covers how to use the proto3 revision of the Protocol Buffers language in your project.

This guide describes how to use the protocol buffer language to structure your protocol buffer data, including .proto file syntax and how to generate data access classes from your .proto files. It covers the proto3 revision of the protocol buffers language.

For information on editions syntax, see theProtobuf Editions Language Guide.

For information on the proto2 syntax, see theProto2 Language Guide.

This is a reference guide – for a step by step example that uses many of the features described in this document, see thetutorialfor your chosen language.

Defining A Message Type

First let’s look at a very simple example. Let’s say you want to define a search request message format, where each search request has a query string, the particular page of results you are interested in, and a number of results per page. Here’s the .proto file you use to define the message type.

syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}

Specifying Field Types

In the earlier example, all the fields are scalar types: two integers (page_number and results_per_page) and a string (query). You can also specify enumerations and composite types like other message types for your field.

Assigning Field Numbers

You must give each field in your message definition a number between 1 and536,870,911 with the following restrictions:

This number cannot be changed once your message type is in use because it identifies the field in themessage wire format. “Changing” a field number is equivalent to deleting that field and creating a new field with the same type but a new number. See Deleting Fieldsfor how to do this properly.

Field numbers should never be reused. Never take a field number out of thereserved list for reuse with a new field definition. SeeConsequences of Reusing Field Numbers.

You should use the field numbers 1 through 15 for the most-frequently-set fields. Lower field number values take less space in the wire format. For example, field numbers in the range 1 through 15 take one byte to encode. Field numbers in the range 16 through 2047 take two bytes. You can find out more about this inProtocol Buffer Encoding.

Consequences of Reusing Field Numbers

Reusing a field number makes decoding wire-format messages ambiguous.

The protobuf wire format is lean and doesn’t provide a way to detect fields encoded using one definition and decoded using another.

Encoding a field using one definition and then decoding that same field with a different definition can lead to:

Common causes of field number reuse:

The field number is limited to 29 bits rather than 32 bits because three bits are used to specify the field’s wire format. For more on this, see theEncoding topic.

Specifying Field Cardinality

Message fields can be one of the following:

Repeated Fields are Packed by Default

In proto3, repeated fields of scalar numeric types use packed encoding by default.

You can find out more about packed encoding inProtocol Buffer Encoding.

Message Type Fields Always Have Field Presence

In proto3, message-type fields already have field presence. Because of this, adding the optional modifier doesn’t change the field presence for the field.

The definitions for Message2 and Message3 in the following code sample generate the same code for all languages, and there is no difference in representation in binary, JSON, and TextFormat:

syntax="proto3";

package foo.bar;

message Message1 {}

message Message2 {
  Message1 foo = 1;
}

message Message3 {
  optional Message1 bar = 1;
}

Well-formed Messages

The term “well-formed,” when applied to protobuf messages, refers to the bytes serialized/deserialized. The protoc parser validates that a given proto definition file is parseable.

Singular fields can appear more than once in wire-format bytes. The parser will accept the input, but only the last instance of that field will be accessible through the generated bindings. SeeLast One Winsfor more on this topic.

Adding More Message Types

Multiple message types can be defined in a single .proto file. This is useful if you are defining multiple related messages – so, for example, if you wanted to define the reply message format that corresponds to your SearchResponsemessage type, you could add it to the same .proto:

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
}

message SearchResponse {
 ...
}

Combining Messages leads to bloat While multiple message types (such as message, enum, and service) can be defined in a single .proto file, it can also lead to dependency bloat when large numbers of messages with varying dependencies are defined in a single file. It’s recommended to include as few message types per .proto file as possible.

To add comments to your .proto files:

/**
 * SearchRequest represents a search query, with pagination options to
 * indicate which results to include in the response.
 */
message SearchRequest {
  string query = 1;

  // Which page number do we want?
  int32 page_number = 2;

  // Number of results to return per page.
  int32 results_per_page = 3;
}

Deleting Fields

Deleting fields can cause serious problems if not done properly.

When you no longer need a field and all references have been deleted from client code, you may delete the field definition from the message. However, youmust reserve the deleted field number. If you do not reserve the field number, it is possible for a developer to reuse that number in the future.

You should also reserve the field name to allow JSON and TextFormat encodings of your message to continue to parse.

Reserved Field Numbers

If you update a message type by entirely deleting a field, or commenting it out, future developers can reuse the field number when making their own updates to the type. This can cause severe issues, as described inConsequences of Reusing Field Numbers. To make sure this doesn’t happen, add your deleted field number to the reserved list.

The protoc compiler will generate error messages if any future developers try to use these reserved field numbers.

message Foo {
  reserved 2, 15, 9 to 11;
}

Reserved field number ranges are inclusive (9 to 11 is the same as 9, 10, 11).

Reserved Field Names

Reusing an old field name later is generally safe, except when using TextProto or JSON encodings where the field name is serialized. To avoid this risk, you can add the deleted field name to the reserved list.

Reserved names affect only the protoc compiler behavior and not runtime behavior, with one exception: TextProto implementations may discard unknown fields (without raising an error like with other unknown fields) with reserved names at parse time (only the C++ and Go implementations do so today). Runtime JSON parsing is not affected by reserved names.

message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

Note that you can’t mix field names and field numbers in the same reservedstatement.

What’s Generated from Your .proto?

When you run the protocol buffer compiler on a .proto, the compiler generates the code in your chosen language you’ll need to work with the message types you’ve described in the file, including getting and setting field values, serializing your messages to an output stream, and parsing your messages from an input stream.

You can find out more about using the APIs for each language by following the tutorial for your chosen language. For even more API details, see the relevant API reference.

Scalar Value Types

A scalar message field can have one of the following types – the table shows the type specified in the .proto file, and the corresponding type in the automatically generated class:

Proto Type Notes
double
float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.
uint32 Uses variable-length encoding.
uint64 Uses variable-length encoding.
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 228.
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 256.
sfixed32 Always four bytes.
sfixed64 Always eight bytes.
bool
string A string must always contain UTF-8 encoded or 7-bit ASCII text, and cannot be longer than 232.
bytes May contain any arbitrary sequence of bytes no longer than 232.
Proto Type C++ Type Java/Kotlin Type[1] Python Type[3] Go Type Ruby Type C# Type PHP Type Dart Type Rust Type
double double double float float64 Float double float double f64
float float float float float32 Float float float double f32
int32 int32_t int int int32 Fixnum or Bignum (as required) int integer int i32
int64 int64_t long int/long[4] int64 Bignum long integer/string[6] Int64 i64
uint32 uint32_t int[2] int/long[4] uint32 Fixnum or Bignum (as required) uint integer int u32
uint64 uint64_t long[2] int/long[4] uint64 Bignum ulong integer/string[6] Int64 u64
sint32 int32_t int int int32 Fixnum or Bignum (as required) int integer int i32
sint64 int64_t long int/long[4] int64 Bignum long integer/string[6] Int64 i64
fixed32 uint32_t int[2] int/long[4] uint32 Fixnum or Bignum (as required) uint integer int u32
fixed64 uint64_t long[2] int/long[4] uint64 Bignum ulong integer/string[6] Int64 u64
sfixed32 int32_t int int int32 Fixnum or Bignum (as required) int integer int i32
sfixed64 int64_t long int/long[4] int64 Bignum long integer/string[6] Int64 i64
bool bool boolean bool bool TrueClass/FalseClass bool boolean bool bool
string std::string String str/unicode[5] string String (UTF-8) string string String ProtoString
bytes std::string ByteString str (Python 2), bytes (Python 3) []byte String (ASCII-8BIT) ByteString string List ProtoBytes

[1] Kotlin uses the corresponding types from Java, even for unsigned types, to ensure compatibility in mixed Java/Kotlin codebases.

[2] In Java, unsigned 32-bit and 64-bit integers are represented using their signed counterparts, with the top bit simply being stored in the sign bit.

[3] In all cases, setting values to a field will perform type checking to make sure it is valid.

[4] 64-bit or unsigned 32-bit integers are always represented as long when decoded, but can be an int if an int is given when setting the field. In all cases, the value must fit in the type represented when set. See [2].

[5] Python strings are represented as unicode on decode but can be str if an ASCII string is given (this is subject to change).

[6] Integer is used on 64-bit machines and string is used on 32-bit machines.

You can find out more about how these types are encoded when you serialize your message inProtocol Buffer Encoding.

Default Field Values

When a message is parsed, if the encoded message bytes do not contain a particular field, accessing that field in the parsed object returns the default value for that field. The default values are type-specific:

The default value for repeated fields is empty (generally an empty list in the appropriate language).

The default value for map fields is empty (generally an empty map in the appropriate language).

Note that for implicit-presence scalar fields, once a message is parsed there’s no way of telling whether that field was explicitly set to the default value (for example whether a boolean was set to false) or just not set at all: you should bear this in mind when defining your message types. For example, don’t have a boolean that switches on some behavior when set to false if you don’t want that behavior to also happen by default. Also note that if a scalar message field is set to its default, the value will not be serialized on the wire. If a float or double value is set to +0 it will not be serialized, but -0 is considered distinct and will be serialized.

See the generated code guide for your chosen language for more details about how defaults work in generated code.

Enumerations

When you’re defining a message type, you might want one of its fields to only have one of a predefined list of values. For example, let’s say you want to add a corpus field for each SearchRequest, where the corpus can be UNIVERSAL,WEB, IMAGES, LOCAL, NEWS, PRODUCTS or VIDEO. You can do this very simply by adding an enum to your message definition with a constant for each possible value.

In the following example we’ve added an enum called Corpus with all the possible values, and a field of type Corpus:

enum Corpus {
  CORPUS_UNSPECIFIED = 0;
  CORPUS_UNIVERSAL = 1;
  CORPUS_WEB = 2;
  CORPUS_IMAGES = 3;
  CORPUS_LOCAL = 4;
  CORPUS_NEWS = 5;
  CORPUS_PRODUCTS = 6;
  CORPUS_VIDEO = 7;
}

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 results_per_page = 3;
  Corpus corpus = 4;
}

Enum Default Value

The default value for the SearchRequest.corpus field is CORPUS_UNSPECIFIEDbecause that is the first value defined in the enum.

In proto3, the first value defined in an enum definition must have the value zero and should have the name ENUM_TYPE_NAME_UNSPECIFIED orENUM_TYPE_NAME_UNKNOWN. This is because:

It is also recommended that this first, default value have no semantic meaning other than “this value was unspecified”.

Enum Value Aliases

You can define aliases by assigning the same value to different enum constants. To do this you need to set the allow_alias option to true. Otherwise, the protocol buffer compiler generates a warning message when aliases are found. Though all alias values are valid for serialization, only the first value is used when deserializing.

enum EnumAllowingAlias {
  option allow_alias = true;
  EAA_UNSPECIFIED = 0;
  EAA_STARTED = 1;
  EAA_RUNNING = 1;
  EAA_FINISHED = 2;
}

enum EnumNotAllowingAlias {
  ENAA_UNSPECIFIED = 0;
  ENAA_STARTED = 1;
  // ENAA_RUNNING = 1;  // Uncommenting this line will cause a warning message.
  ENAA_FINISHED = 2;
}

Enumerator constants must be in the range of a 32-bit integer. Since enumvalues usevarint encoding on the wire, negative values are inefficient and thus not recommended. You can defineenums within a message definition, as in the earlier example, or outside – these enums can be reused in any message definition in your .proto file. You can also use an enum type declared in one message as the type of a field in a different message, using the syntax _MessageType_._EnumType_.

When you run the protocol buffer compiler on a .proto that uses an enum, the generated code will have a corresponding enum for Java, Kotlin, or C++, or a special EnumDescriptor class for Python that’s used to create a set of symbolic constants with integer values in the runtime-generated class.

During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.

For more information about how to work with message enums in your applications, see the generated code guidefor your chosen language.

Reserved Values

If you update an enum type by entirely removing an enum entry, or commenting it out, future users can reuse the numeric value when making their own updates to the type. This can cause severe issues if they later load old instances of the same .proto, including data corruption, privacy bugs, and so on. One way to make sure this doesn’t happen is to specify that the numeric values (and/or names, which can also cause issues for JSON serialization) of your deleted entries are reserved. The protocol buffer compiler will complain if any future users try to use these identifiers. You can specify that your reserved numeric value range goes up to the maximum possible value using themax keyword.

enum Foo {
  reserved 2, 15, 9 to 11, 40 to max;
  reserved "FOO", "BAR";
}

Note that you can’t mix field names and numeric values in the same reservedstatement.

Using Other Message Types

You can use other message types as field types. For example, let’s say you wanted to include Result messages in each SearchResponse message – to do this, you can define a Result message type in the same .proto and then specify a field of type Result in SearchResponse:

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

Importing Definitions

In the earlier example, the Result message type is defined in the same file asSearchResponse – what if the message type you want to use as a field type is already defined in another .proto file?

You can use definitions from other .proto files by importing them. To import another .proto’s definitions, you add an import statement to the top of your file:

import "myproject/other_protos.proto";

By default, you can use definitions only from directly imported .proto files. However, sometimes you may need to move a .proto file to a new location. Instead of moving the .proto file directly and updating all the call sites in a single change, you can put a placeholder .proto file in the old location to forward all the imports to the new location using the import public notion.

Note: The public import functionality available in Java is most effective when moving an entire .proto file or when using java_multiple_files = true. In these cases, generated names remain stable, avoiding the need to update references in your code. While technically functional when moving a subset of a .proto file without java_multiple_files = true, doing so requires simultaneous updates to many references, thus might not significantly ease migration. The functionality is not available in Kotlin, TypeScript, JavaScript, GCL, or with C++ targets that use protobuf static reflection.

import public dependencies can be transitively relied upon by any code importing the proto containing the import public statement. For example:

// new.proto
// All definitions are moved here
// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";
// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

The protocol compiler searches for imported files in a set of directories specified on the protocol compiler command line using the -I/--proto_pathflag. If no flag was given, it looks in the directory in which the compiler was invoked. In general you should set the --proto_path flag to the root of your project and use fully qualified names for all imports.

Using proto2 Message Types

It’s possible to importproto2 message types and use them in your proto3 messages, and vice versa. However, proto2 enums cannot be used directly in proto3 syntax (it’s okay if an imported proto2 message uses them).

Nested Types

You can define and use message types inside other message types, as in the following example – here the Result message is defined inside theSearchResponse message:

message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  }
  repeated Result results = 1;
}

If you want to reuse this message type outside its parent message type, you refer to it as _Parent_._Type_:

message SomeOtherMessage {
  SearchResponse.Result result = 1;
}

You can nest messages as deeply as you like. In the example below, note that the two nested types named Inner are entirely independent, since they are defined within different messages:

message Outer {       // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      int64 ival = 1;
      bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      int32 ival = 1;
      bool  booly = 2;
    }
  }
}

Updating A Message Type

If an existing message type no longer meets all your needs – for example, you’d like the message format to have an extra field – but you’d still like to use code created with the old format, don’t worry! It’s very simple to update message types without breaking any of your existing code when you use the binary wire format.

CheckProto Best Practices and the following rules:

Unknown Fields

Unknown fields are well-formed protocol buffer serialized data representing fields that the parser does not recognize. For example, when an old binary parses data sent by a new binary with new fields, those new fields become unknown fields in the old binary.

Proto3 messages preserve unknown fields and includes them during parsing and in the serialized output, which matches proto2 behavior.

Retaining Unknown Fields

Some actions can cause unknown fields to be lost. For example, if you do one of the following, unknown fields are lost:

To avoid losing unknown fields, do the following:

TextFormat is a bit of a special case. Serializing to TextFormat prints unknown fields using their field numbers. But parsing TextFormat data back into a binary proto fails if there are entries that use field numbers.

Any

The Any message type lets you use messages as embedded types without having their .proto definition. An Any contains an arbitrary serialized message asbytes, along with a URL that acts as a globally unique identifier for and resolves to that message’s type. To use the Any type, you need toimport google/protobuf/any.proto.

import "google/protobuf/any.proto";

message ErrorStatus {
  string message = 1;
  repeated google.protobuf.Any details = 2;
}

The default type URL for a given message type istype.googleapis.com/_packagename_._messagename_.

Different language implementations will support runtime library helpers to pack and unpack Any values in a typesafe manner – for example, in Java, the Anytype will have special pack() and unpack() accessors, while in C++ there arePackFrom() and UnpackTo() methods:

// Storing an arbitrary message type in Any.
NetworkErrorDetails details = ...;
ErrorStatus status;
status.add_details()->PackFrom(details);

// Reading an arbitrary message from Any.
ErrorStatus status = ...;
for (const google::protobuf::Any& detail : status.details()) {
  if (detail.Is<NetworkErrorDetails>()) {
    NetworkErrorDetails network_error;
    detail.UnpackTo(&network_error);
    ... processing network_error ...
  }
}

Oneof

If you have a message with many singular fields and where at most one field will be set at the same time, you can enforce this behavior and save memory by using the oneof feature.

Oneof fields are like optional fields except all the fields in a oneof share memory, and at most one field can be set at the same time. Setting any member of the oneof automatically clears all the other members. You can check which value in a oneof is set (if any) using a special case() or WhichOneof() method, depending on your chosen language.

Note that if multiple values are set, the last set value as determined by the order in the proto will overwrite all previous ones.

Field numbers for oneof fields must be unique within the enclosing message.

Using Oneof

To define a oneof in your .proto you use the oneof keyword followed by your oneof name, in this case test_oneof:

message SampleMessage {
  oneof test_oneof {
    string name = 4;
    SubMessage sub_message = 9;
  }
}

You then add your oneof fields to the oneof definition. You can add fields of any type, except map fields and repeated fields. If you need to add a repeated field to a oneof, you can use a message containing the repeated field.

In your generated code, oneof fields have the same getters and setters as regular fields. You also get a special method for checking which value (if any) in the oneof is set. You can find out more about the oneof API for your chosen language in the relevant API reference.

Oneof Features

SampleMessage message;  
message.set_name("name");  
CHECK_EQ(message.name(), "name");  
// Calling mutable_sub_message() will clear the name field and will set  
// sub_message to a new instance of SubMessage with none of its fields set.  
message.mutable_sub_message();  
CHECK(message.name().empty());  
SampleMessage message;  
SubMessage* sub_message = message.mutable_sub_message();  
message.set_name("name");      // Will delete sub_message  
sub_message->set_...            // Crashes here  
SampleMessage msg1;  
msg1.set_name("name");  
SampleMessage msg2;  
msg2.mutable_sub_message();  
msg1.swap(&msg2);  
CHECK(msg1.has_sub_message());  
CHECK_EQ(msg2.name(), "name");  

Backwards-compatibility issues

Be careful when adding or removing oneof fields. If checking the value of a oneof returns None/NOT_SET, it could mean that the oneof has not been set or it has been set to a field in a different version of the oneof. There is no way to tell the difference, since there’s no way to know if an unknown field on the wire is a member of the oneof.

Tag Reuse Issues

Maps

If you want to create an associative map as part of your data definition, protocol buffers provides a handy shortcut syntax:

map<key_type, value_type> map_field = N;

…where the key_type can be any integral or string type (so, anyscalar type except for floating point types and bytes). Note that neither enum nor proto messages are valid for key_type. The value_type can be any type except another map.

So, for example, if you wanted to create a map of projects where each Projectmessage is associated with a string key, you could define it like this:

map<string, Project> projects = 3;

Maps Features

The generated map API is currently available for all supported languages. You can find out more about the map API for your chosen language in the relevantAPI reference.

Backwards Compatibility

The map syntax is equivalent to the following on the wire, so protocol buffers implementations that do not support maps can still handle your data:

message MapFieldEntry {
  key_type key = 1;
  value_type value = 2;
}

repeated MapFieldEntry map_field = N;

Any protocol buffers implementation that supports maps must both produce and accept data that can be accepted by the earlier definition.

Packages

You can add an optional package specifier to a .proto file to prevent name clashes between protocol message types.

package foo.bar;
message Open { ... }

You can then use the package specifier when defining fields of your message type:

message Foo {
  ...
  foo.bar.Open open = 1;
  ...
}

The way a package specifier affects the generated code depends on your chosen language:

Note that even when the package directive does not directly affect the generated code, for example in Python, it is still strongly recommended to specify the package for the .proto file, as otherwise it may lead to naming conflicts in descriptors and make the proto not portable for other languages.

Packages and Name Resolution

Type name resolution in the protocol buffer language works like C++: first the innermost scope is searched, then the next-innermost, and so on, with each package considered to be “inner” to its parent package. A leading ‘.’ (for example, .foo.bar.Baz) means to start from the outermost scope instead.

The protocol buffer compiler resolves all type names by parsing the imported.proto files. The code generator for each language knows how to refer to each type in that language, even if it has different scoping rules.

Defining Services

If you want to use your message types with an RPC (Remote Procedure Call) system, you can define an RPC service interface in a .proto file and the protocol buffer compiler will generate service interface code and stubs in your chosen language. So, for example, if you want to define an RPC service with a method that takes your SearchRequest and returns a SearchResponse, you can define it in your .proto file as follows:

service SearchService {
  rpc Search(SearchRequest) returns (SearchResponse);
}

The most straightforward RPC system to use with protocol buffers isgRPC: a language- and platform-neutral open source RPC system developed at Google. gRPC works particularly well with protocol buffers and lets you generate the relevant RPC code directly from your .proto files using a special protocol buffer compiler plugin.

If you don’t want to use gRPC, it’s also possible to use protocol buffers with your own RPC implementation. You can find out more about this in theProto2 Language Guide.

There are also a number of ongoing third-party projects to develop RPC implementations for Protocol Buffers. For a list of links to projects we know about, see thethird-party add-ons wiki page.

JSON Mapping

The standard protobuf binary wire format is the preferred serialization format for communication between two systems that use protobufs. For communicating with systems that use JSON rather than protobuf wire format, Protobuf supports a canonical encoding in JSON.

Options

Individual declarations in a .proto file can be annotated with a number of_options_. Options do not change the overall meaning of a declaration, but may affect the way it is handled in a particular context. The complete list of available options is defined in /google/protobuf/descriptor.proto.

Some options are file-level options, meaning they should be written at the top-level scope, not inside any message, enum, or service definition. Some options are message-level options, meaning they should be written inside message definitions. Some options are field-level options, meaning they should be written inside field definitions. Options can also be written on enum types, enum values, oneof fields, service types, and service methods; however, no useful options currently exist for any of these.

Here are a few of the most commonly used options:

option java_package = "com.example.foo";  
option java_outer_classname = "Ponycopter";  
option java_multiple_files = true;  
option optimize_for = CODE_SIZE;  
// This file relies on plugins to generate service code.  
option cc_generic_services = false;  
option java_generic_services = false;  
option py_generic_services = false;  
repeated int32 samples = 4 [packed = false];  
int32 old_field = 6 [deprecated = true];  

Enum Value Options

Enum value options are supported. You can use the deprecated option to indicate that a value shouldn’t be used anymore. You can also create custom options using extensions.

The following example shows the syntax for adding these options:

import "google/protobuf/descriptor.proto";

extend google.protobuf.EnumValueOptions {
  optional string string_name = 123456789;
}

enum Data {
  DATA_UNSPECIFIED = 0;
  DATA_SEARCH = 1 [deprecated = true];
  DATA_DISPLAY = 2 [
    (string_name) = "display_value"
  ];
}

The C++ code to read the string_name option might look something like this:

const absl::string_view foo = proto2::GetEnumDescriptor<Data>()
    ->FindValueByName("DATA_DISPLAY")->options().GetExtension(string_name);

See Custom Options to see how to apply custom options to enum values and to fields.

Custom Options

Protocol Buffers also allows you to define and use your own options. Note that this is an advanced feature which most people don’t need. If you do think you need to create your own options, see theProto2 Language Guidefor details. Note that creating custom options usesextensions, which are permitted only for custom options in proto3.

Option Retention

Options have a notion of retention, which controls whether an option is retained in the generated code. Options have runtime retention by default, meaning that they are retained in the generated code and are thus visible at runtime in the generated descriptor pool. However, you can set retention = RETENTION_SOURCE to specify that an option (or field within an option) must not be retained at runtime. This is called source retention.

Option retention is an advanced feature that most users should not need to worry about, but it can be useful if you would like to use certain options without paying the code size cost of retaining them in your binaries. Options with source retention are still visible to protoc and protoc plugins, so code generators can use them to customize their behavior.

Retention can be set directly on an option, like this:

extend google.protobuf.FileOptions {
  optional int32 source_retention_option = 1234
      [retention = RETENTION_SOURCE];
}

It can also be set on a plain field, in which case it takes effect only when that field appears inside an option:

message OptionsMessage {
  int32 source_retention_field = 1 [retention = RETENTION_SOURCE];
}

You can set retention = RETENTION_RUNTIME if you like, but this has no effect since it is the default behavior. When a message field is markedRETENTION_SOURCE, its entire contents are dropped; fields inside it cannot override that by trying to set RETENTION_RUNTIME.

Option Targets

Fields have a targets option which controls the types of entities that the field may apply to when used as an option. For example, if a field hastargets = TARGET_TYPE_MESSAGE then that field cannot be set in a custom option on an enum (or any other non-message entity). Protoc enforces this and will raise an error if there is a violation of the target constraints.

At first glance, this feature may seem unnecessary given that every custom option is an extension of the options message for a specific entity, which already constrains the option to that one entity. However, option targets are useful in the case where you have a shared options message applied to multiple entity types and you want to control the usage of individual fields in that message. For example:

message MyOptions {
  string file_only_option = 1 [targets = TARGET_TYPE_FILE];
  int32 message_and_enum_option = 2 [targets = TARGET_TYPE_MESSAGE,
                                     targets = TARGET_TYPE_ENUM];
}

extend google.protobuf.FileOptions {
  optional MyOptions file_options = 50000;
}

extend google.protobuf.MessageOptions {
  optional MyOptions message_options = 50000;
}

extend google.protobuf.EnumOptions {
  optional MyOptions enum_options = 50000;
}

// OK: this field is allowed on file options
option (file_options).file_only_option = "abc";

message MyMessage {
  // OK: this field is allowed on both message and enum options
  option (message_options).message_and_enum_option = 42;
}

enum MyEnum {
  MY_ENUM_UNSPECIFIED = 0;
  // Error: file_only_option cannot be set on an enum.
  option (enum_options).file_only_option = "xyz";
}

Generating Your Classes

To generate the Java, Kotlin, Python, C++, Go, Ruby, Objective-C, or C# code that you need to work with the message types defined in a .proto file, you need to run the protocol buffer compiler protoc on the .proto file. If you haven’t installed the compiler,download the package and follow the instructions in the README. For Go, you also need to install a special code generator plugin for the compiler; you can find this and installation instructions in the golang/protobufrepository on GitHub.

The Protocol Compiler is invoked as follows:

protoc --proto_path=IMPORT_PATH --cpp_out=DST_DIR --java_out=DST_DIR --python_out=DST_DIR --go_out=DST_DIR --ruby_out=DST_DIR --objc_out=DST_DIR --csharp_out=DST_DIR path/to/file.proto

Note: File paths relative to their proto_path must be globally unique in a given binary. For example, if you have proto/lib1/data.proto andproto/lib2/data.proto, those two files cannot be used together with-I=proto/lib1 -I=proto/lib2 because it would be ambiguous which file import "data.proto" will mean. Instead -Iproto/ should be used and the global names will be lib1/data.proto and lib2/data.proto.

If you are publishing a library and other users may use your messages directly, you should include a unique library name in the path that they are expected to be used under to avoid file name collisions. If you have multiple directories in one project, it is best practice to prefer setting one -I to a top level directory of the project.

File location

Prefer not to put .proto files in the same directory as other language sources. Consider creating a subpackage proto for .proto files, under the root package for your project.

Location Should be Language-agnostic

When working with Java code, it’s handy to put related .proto files in the same directory as the Java source. However, if any non-Java code ever uses the same protos, the path prefix will no longer make sense. So in general, put the protos in a related language-agnostic directory such as//myteam/mypackage.

The exception to this rule is when it’s clear that the protos will be used only in a Java context, such as for testing.

Supported Platforms

For information about: