AIP-193: Errors (original) (raw)
AIP-193
Effective error communication is an important part of designing simple and intuitive APIs. Services returning standardized error responses enable API clients to construct centralized common error handling logic. This common logic simplifies API client applications and eliminates the need for cumbersome custom error handling code.
Guidance
Services must return a google.rpc.Status message when an API error occurs, and must use the canonical error codes defined ingoogle.rpc.Code. More information about the particular codes is available in the gRPC status code documentation.
Error messages should help a reasonably technical user _understand_and resolve the issue, and should not assume that the user is an expert in your particular API. Additionally, error messages must notassume that the user will know anything about its underlying implementation.
Error messages should be brief but actionable. Any extra informationshould be provided in the details
field. If even more information is necessary, you should provide a link where a reader can get more information or ask questions to help resolve the issue. It is also important to set the right tone when writing messages.
The following sections describe the fields of google.rpc.Status
.
Status.message
The message
field is a developer-facing, human-readable "debug message" which should be in English. (Localized messages are expressed using a LocalizedMessage
within the details
field. SeeLocalizedMessage for more details.) Any dynamic aspects of the message must be included as metadata within the ErrorInfo
that appears in details.
The message is considered a problem description. It is intended for developers to understand the problem and is more detailed thanErrorInfo.reason, discussed later.
Messages should use simple descriptive language that is easy to understand (without technical jargon) to clearly state the problem that results in an error, and offer an actionable resolution to it.
For pre-existing (brownfield) APIs which have previously returned errors without machine-readable identifiers, the value of message
mustremain the same for any given error. For more information, seeChanging Error Messages.
Status.code
The code
field is the status code, which must be the numeric value of one of the elements of the google.rpc.Code enum.
For example, the value 5
is the numeric value of the NOT_FOUND
enum element.
Status.details
The details
field allows messages with additional error information to be included in the error response, each packed in a google.protobuf.Any
message.
Google defines a set of standard detail payloads for error details, which cover most common needs for API errors. Services should use these standard detail payloads when feasible.
Each type of detail payload must be included at most once. For example, there must not be more than one BadRequestmessage in the details
, but there may be a BadRequest
and aPreconditionFailure.
All error responses must include an ErrorInfo
within details
. This provides machine-readable identifiers so that users can write code against specific aspects of the error.
The following sections describe the most common standard detail payloads.
ErrorInfo
The ErrorInfo message is the primary way to send a machine-readable identifier. Contextual information should be included in metadata
in ErrorInfo
and must be included if it appears within an error message.
The reason
field is a short snake_case description of the cause of the error. Error reasons are unique within a particular domain of errors. The reason must be at most 63 characters and match a regular expression of[A-Z][A-Z0-9_]+[A-Z0-9]
. (This is UPPER_SNAKE_CASE, without leading or trailing underscores, and without leading digits.)
The reason should be terse, but meaningful enough for a human reader to understand what the reason refers to.
Good examples:
CPU_AVAILABILITY
NO_STOCK
CHECKED_OUT
AVAILABILITY_ERROR
Bad examples:
THE_BOOK_YOU_WANT_IS_NOT_AVAILABLE
(overly verbose)ERROR
(too general)
The domain
field is the logical grouping to which the reason
belongs. The domain must be a globally unique value, and is typically the name of the service that generated the error, e.g. pubsub.googleapis.com
.
The (reason, domain) pair form a machine-readable way of identifying a particular error. Services must use the same (reason, domain) pair for the same error, andmust not use the same (reason, domain) pair for logically different errors. The decision about whether two errors are "the same" or not is not always clear, butshould generally be considered in terms of the expected action a client might take to resolve them.
The metadata
field is a map of key/value pairs providing additional dynamic information as context. Each key within metadata
must be at most 64 characters long, and conform to the regular expression [a-z][a-zA-Z0-9-_]+
.
Any request-specific information which contributes to the Status.message
orLocalizedMessage.message
messages must be represented within metadata
. This practice is critical so that machine actors do not need to parse error messages to extract information.
For example consider the following message:
An VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation.
The ErrorInfo.metadata
map for the same error could be:
"zone": "us-east1-a"
"vmType": "e2-medium"
"attachment": "local-ssd=3,nvidia-t4=2"
"zonesWithCapacity": "us-central1-f,us-central1-c"
Additional contextual information that does not appear in an error messagemay also be included in metadata
to allow programmatic use by the client.
The metadata included for any given (reason,domain) pair can evolve over time:
- New keys may be included
- All keys that have been included must continue to be included (but may have empty values)
In other words, once a user has observed a given key for a (reason, domain) pair, the service must allow them to rely on it continuing to be present in the future.
The set of keys provided in each (reason, domain) pair is independent from other pairs, but services should aim for consistent key naming. For example, two error reasons within the same domain should not use metadata keys of vmType
and virtualMachineType
.
LocalizedMessage
google.rpc.LocalizedMessage is used to provide an error message which should be localized to a user-specified locale where possible.
If the Status.message field has a sub-optimal value which cannot be changed due to the constraints in theChanging Error Messages section, LocalizedMessage
may be used to provide a better error message even when no user-specified locale is available.
Regardless of how the locale for the message was determined, both the locale
and message
fields must be populated.
The locale
field specifies the locale of the message, following IETF bcp47 (Tags for Identifying Languages). Example values: "en-US"
, "fr-CH"
, "es-MX"
.
The message
field contains the localized text itself. Thisshould include a brief description of the error and a call to action to resolve the error. The message should include contextual information to make the message as specific as possible. Any contextual information in the message must be included in ErrorInfo.metadata
. SeeErrorInfo for more details of how contextual information may be included in a message and the corresponding metadata.
The LocalizedMessage
payload should contain the complete resolution to the error. If more information is needed than can reasonably fit in this payload, then additional resolution information must be provided in a Help
payload. See the Help section for guidance.
Help
When other textual error messages (in Status.message
orLocalizedMessage.message
) don't provide the user sufficient context or actionable next steps, or if there are multiple points of failure that need to be considered in troubleshooting, a link to supplemental troubleshooting documentation must be provided in theHelp
payload.
Provide this information in addition to a clear problem definition and actionable resolution, not as an alternative to them. The linked documentation must clearly relate to the error. If a single page contains information about multiple errors, theErrorInfo.reason value must be used to narrow down the relevant information.
The description
field is a textual description of the linked information. This must be suitable to display to a user as text for a hyperlink. This must be plain text (not HTML, Markdown etc).
Example description
value: "Troubleshooting documentation for STOCKOUT errors"
The url
field is the URL to link to. This must be an absolute URL, including scheme.
Example url
value:"https://cloud.google.com/compute/docs/resource-error"
For publicly-documented services, even those with access controls on actual usage, the linked content must be accessible without authentication.
For privately-documented services, the linked content may require authentication.
Error messages
Textual error messages can be present in both Status.message
andLocalizedMessage.message
fields. Messages should be succinct but actionable, with request-specific information (such as a resource name or region) providing precise details where appropriate. Any request-specific details must be present in ErrorInfo.metadata.
Changing error messages
Changing the content of Status.message
over time must be done carefully, to avoid breaking clients who have previously had to rely on the message for all information. See the rationale sectionfor more details.
For a given RPC:
- If the RPC has always returned
ErrorInfo
with machine-readable information, the content ofStatus.message
may change over time. (For example, the API producer may provide a clearer explanation, or more request-specific information.) - Otherwise, the content of
Status.message
must be stable, providing the same text with the same request-specific information. Instead of changingStatus.message
, the API should include aLocalizedMessage withinStatus.details
.
Even if an RPC has always returned ErrorInfo
, the API may keep the existing Status.message
stable and add aLocalizedMessage within Status.details
.
The content of LocalizedMessage.details
may change over time.
Partial errors
APIs should not support partial errors. Partial errors add significant complexity for users, because they usually sidestep the use of error codes, or move those error codes into the response message, where the user must write specialized error handling logic to address the problem.
However, occasionally partial errors are necessary, particularly in bulk operations where it would be hostile to users to fail an entire large request because of a problem with a single entry.
Methods that require partial errors should use long-running operations, and the method should put partial failure information in the metadata message. The errors themselves must still be represented with a google.rpc.Status object.
Permission Denied
If the user does not have permission to access the resource or parent, regardless of whether or not it exists, the service must error withPERMISSION_DENIED
(HTTP 403). Permission must be checked prior to checking if the resource or parent exists.
If the user does have proper permission, but the requested resource or parent does not exist, the service must error with NOT_FOUND
(HTTP 404).
HTTP/1.1+JSON representation
When clients use HTTP/1.1 as per AIP-127, the error information is returned in the body of the response, as a JSON object. For backward compatibility reasons, this does not map precisely to google.rpc.Status
, but contains the same core information. The schema is defined in the following proto:
`` message Error {
message Status {
// The HTTP status code that corresponds to google.rpc.Status.code
.
int32 code = 1;
// This corresponds to google.rpc.Status.message
.
string message = 2;
// This is the enum version for google.rpc.Status.code
.
google.rpc.Code status = 4;
// This corresponds to google.rpc.Status.details
.
repeated google.protobuf.Any details = 5;
}
Status error = 1; } ``
The most important difference is that the code
field in the JSON is an HTTP status code,not the direct value of google.rpc.Status.code
. For example, a google.rpc.Status
message with a code
value of 5 would be mapped to an object including the following code-related fields (as well as the message, details etc):
{ "error": { "code": 404, // The HTTP status code for "not found" "status": "NOT_FOUND" // The name in google.rpc.Code for value 5 } }
The following JSON shows a fully populated HTTP/1.1+JSON representation of an error response.
{ "error": { "code": 429, "message": "The zone 'us-east1-a' does not have enough resources available to fulfill the request. Try a different zone, or try again later.", "status": "RESOURCE_EXHAUSTED", "details": [ { "@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "RESOURCE_AVAILABILITY", "domain": "compute.googleapis.com", "metadata": { "zone": "us-east1-a", "vmType": "e2-medium", "attachment": "local-ssd=3,nvidia-t4=2", "zonesWithCapacity": "us-central1-f,us-central1-c" } }, { "@type": "type.googleapis.com/google.rpc.LocalizedMessage", "locale": "en-US", "message": "An <e2-medium> VM instance with <local-ssd=3,nvidia-t4=2> is currently unavailable in the <us-east1-a> zone. Consider trying your request in the <us-central1-f,us-central1-c> zone(s), which currently has/have capacity to accommodate your request. Alternatively, you can try your request again with a different VM hardware configuration or at a later time. For more information, see the troubleshooting documentation." }, { "@type": "type.googleapis.com/google.rpc.Help", "links": [ { "description": "Additional information on this error", "url": "https://cloud.google.com/compute/docs/resource-error" } ] } ] } }
Rationale
Requiring ErrorInfo
ErrorInfo
is required because it further identifies an error. With only approximately twenty available values for Status.status
, it is difficult to disambiguate one error from another across an entireAPI Service.
Also, error messages often contain dynamic segments that express variable information, so there needs to be machine-readable component of_every_ error response that enables clients to use such information programmatically.
Including LocalizedMessage
LocalizedMessage
was selected as the location to present alternate error messages. While LocalizedMessage
may use a locale specified in the request, a service may provide a LocalizedMessage
even without a user-specified locale, typically to provide a better error message insituations where Status.message cannot be changed. Where the locale is not specified by the user, it should be en-US
(US English).
A service may include LocalizedMessage
even when the same message is provided in Status.message
and when localization into a user-specified locale is not supported. Reasons for this include:
- An intention to support user-specified localization in the near future, allowing clients to consistently use
LocalizedMessage
and not change their error-reporting code when the functionality is introduced. - Consistency across all RPCs within a service: if some RPCs include
LocalizedMessage
and some only useStatus.message
for error messages, clients have to be aware of which RPCs will do what, or implement a fall-back mechanism. ProvidingLocalizedMessage
on all RPCs allows simple and consistent client code to be written.
Updating Status.message
If a client has ever observed an error with Status.message
populated (which it always will be) but without ErrorInfo
, the developer of that client may well have had to resort to parsing Status.message
in order to find out information beyond just what Status.code
conveys. That information may be found by matching specific text (e.g. "Connection closed with unknown cause") or by parsing the message to find out metadata values (e.g. a region with insufficient resources). At that point, Status.message
is implicitly part of the API contract, so must not be updated - that would be a breaking change. This is one reason for introducing LocalizedMessage
into theStatus.details
.
RPCs which have always included ErrorInfo
are in a better position: the contract is then more about the stability of ErrorInfo
for any given error. The reason and domain need to be consistent over time, and the metadata provided for any given (reason,domain) can only be expanded. It's still possible that clients could be parsing Status.message
instead of using ErrorInfo
, but they will always have had a more robust option available to them.
Further reading
- For which error codes to retry, see AIP-194.
- For how to retry errors in client libraries, seeAIP-4221.
Changelog
- 2024-10-18: Rewrite/restructure for clarity.
- 2024-01-10: Incorporate guidance for writing effective messages.
- 2023-05-17: Change the recommended language for
Status.message
to be the service's native language rather than English. - 2023-05-17: Specify requirements for changing error messages.
- 2023-05-10: Require ErrorInfo for all error responses.
- 2023-05-04: Require uniqueness by message type for error details.
- 2022-11-04: Added guidance around PERMISSION_DENIED errors previously found in other AIPs.
- 2022-08-12: Reworded/Simplified intro to add clarity to the intent.
- 2020-01-22: Added a reference to the ErrorInfo message.
- 2019-10-14: Added guidance restricting error message mutability to if there is a machine-readable identifier present.
- 2019-09-23: Added guidance about error message strings being able to change.