tf.raw_ops.DecodeProtoV2  |  TensorFlow v2.16.1 (original) (raw)

The op extracts fields from a serialized protocol buffers message into tensors.

View aliases

Compat aliases for migration

SeeMigration guide for more details.

tf.compat.v1.raw_ops.DecodeProtoV2

tf.raw_ops.DecodeProtoV2(
    bytes,
    message_type,
    field_names,
    output_types,
    descriptor_source='local://',
    message_format='binary',
    sanitize=False,
    name=None
)

The decode_proto op extracts fields from a serialized protocol buffers message into tensors. The fields in field_names are decoded and converted to the corresponding output_types if possible.

A message_type name must be provided to give context for the field names. The actual message descriptor can be looked up either in the linked-in descriptor pool or a filename provided by the caller using thedescriptor_source attribute.

Each output tensor is a dense tensor. This means that it is padded to hold the largest number of repeated elements seen in the input minibatch. (The shape is also padded by one to prevent zero-sized dimensions). The actual repeat counts for each example in the minibatch can be found in the sizesoutput. In many cases the output of decode_proto is fed immediately into tf.squeeze if missing values are not a concern. When using tf.squeeze, always pass the squeeze dimension explicitly to avoid surprises.

For the most part, the mapping between Proto field types and TensorFlow dtypes is straightforward. However, there are a few special cases:

Both binary and text proto serializations are supported, and can be chosen using the format attribute.

The descriptor_source attribute selects the source of protocol descriptors to consult when looking up message_type. This may be:

Here is an example:

The, internal, Summary.Value proto contains aoneof {float simple_value; Image image; ...}

from google.protobuf import text_format `` # A Summary.Value contains: oneof {float simple_value; Image image} values = [ "simple_value: 2.2", "simple_value: 1.2", "image { height: 128 width: 512 }", "image { height: 256 width: 256 }",] values = [ text_format.Parse(v, tf.compat.v1.Summary.Value()).SerializeToString() for v in values]

The following can decode both fields from the serialized strings:

sizes, [simple_value, image] = tf.io.decode_proto( values, tf.compat.v1.Summary.Value.DESCRIPTOR.full_name, field_names=['simple_value', 'image'], output_types=[tf.float32, tf.string])

The sizes has the same shape as the input, with an additional axis across the fields that were decoded. Here the first column of sizes is the size of the decoded simple_value field:

print(sizes) tf.Tensor( [[1 0] [1 0] [0 1] [0 1]], shape=(4, 2), dtype=int32)

The result tensors each have one more index than the input byte-strings. The valid elements of each result tensor are indicated by the appropriate column of sizes. The invalid elements are padded with a default value:

print(simple_value) tf.Tensor( [[2.2] [1.2] [0. ] [0. ]], shape=(4, 1), dtype=float32)

Nested protos are extracted as string tensors:

print(image.dtype) <dtype: 'string'> print(image.shape.as_list()) [4, 1]

To convert to a tf.RaggedTensor representation use:

tf.RaggedTensor.from_tensor(simple_value, lengths=sizes[:, 0]).to_list() [[2.2], [1.2], [], []]

Args
bytes A Tensor of type string. Tensor of serialized protos with shape batch_shape.
message_type A string. Name of the proto message type to decode.
field_names A list of strings. List of strings containing proto field names. An extension field can be decoded by using its full name, e.g. EXT_PACKAGE.EXT_FIELD_NAME.
output_types A list of tf.DTypes. List of TF types to use for the respective field in field_names.
descriptor_source An optional string. Defaults to "local://". Either the special value local:// or a path to a file containing a serialized FileDescriptorSet.
message_format An optional string. Defaults to "binary". Either binary or text.
sanitize An optional bool. Defaults to False. Whether to sanitize the result or not.
name A name for the operation (optional).
Returns
A tuple of Tensor objects (sizes, values).
sizes A Tensor of type int32.
values A list of Tensor objects of type output_types.