tf.io.parse_example | TensorFlow v2.16.1 (original) (raw)

Parses Example protos into a dict of tensors.

tf.io.parse_example(
    serialized, features, example_names=None, name=None
)

Used in the notebooks

Used in the guide	Used in the tutorials
tf.data: Build TensorFlow input pipelines Ragged tensors	TFX Keras Component Tutorial Preprocessing data with TensorFlow Transform Introduction to Fairness Indicators TFX Estimator Component Tutorial Feature Engineering using TFX Pipeline and TensorFlow Transform

Parses a number of serialized Exampleprotos given in serialized. We refer to serialized as a batch withbatch_size many entries of individual Example protos.

example_names may contain descriptive names for the corresponding serialized protos. These may be useful for debugging purposes, but they have no effect on the output. If not None, example_names must be the same length asserialized.

This op parses serialized examples into a dictionary mapping keys to Tensor SparseTensor, and RaggedTensor objects. features is a Mapping from keys to VarLenFeature, SparseFeature, RaggedFeature, and FixedLenFeatureobjects. Each VarLenFeature and SparseFeature is mapped to aSparseTensor; each FixedLenFeature is mapped to a Tensor; and eachRaggedFeature is mapped to a RaggedTensor.

Each VarLenFeature maps to a SparseTensor of the specified type representing a ragged matrix. Its indices are [batch, index] where batchidentifies the example in serialized, and index is the value's index in the list of values associated with that feature and example.

Each SparseFeature maps to a SparseTensor of the specified type representing a Tensor of dense_shape [batch_size] + SparseFeature.size. Its values come from the feature in the examples with key value_key. A values[i] comes from a position k in the feature of an example at batch entry batch. This positional information is recorded in indices[i] as[batch, index_0, index_1, ...] where index_j is the k-th value of the feature in the example at with key SparseFeature.index_key[j]. In other words, we split the indices (except the first index indicating the batch entry) of a SparseTensor by dimension into different features of theExample. Due to its complexity a VarLenFeature should be preferred over aSparseFeature whenever possible.

Each FixedLenFeature df maps to a Tensor of the specified type (ortf.float32 if not specified) and shape (serialized.size(),) + df.shape.

FixedLenFeature entries with a default_value are optional. With no default value, we will fail if that Feature is missing from any example inserialized.

Each FixedLenSequenceFeature df maps to a Tensor of the specified type (or tf.float32 if not specified) and shape(serialized.size(), None) + df.shape. All examples in serialized will be padded with default_value along the second dimension.

Each RaggedFeature maps to a RaggedTensor of the specified type. It is formed by stacking the RaggedTensor for each example, where theRaggedTensor for each individual example is constructed using the tensors specified by RaggedTensor.values_key and RaggedTensor.partition. See the tf.io.RaggedFeature documentation for details and examples.

Examples:

For example, if one expects a tf.float32 VarLenFeature ft and three serialized Examples are provided:

serialized = [
  features
    { feature { key: "ft" value { float_list { value: [1.0, 2.0] } } } },
  features
    { feature []},
  features
    { feature { key: "ft" value { float_list { value: [3.0] } } }
]

then the output will look like:

{"ft": SparseTensor(indices=[[0, 0], [0, 1], [2, 0]],
                    values=[1.0, 2.0, 3.0],
                    dense_shape=(3, 2)) }

If instead a FixedLenSequenceFeature with default_value = -1.0 andshape=[] is used then the output will look like:

{"ft": [[1.0, 2.0], [3.0, -1.0]]}

Given two Example input protos in serialized:

[
  features {
    feature { key: "kw" value { bytes_list { value: [ "knit", "big" ] } } }
    feature { key: "gps" value { float_list { value: [] } } }
  },
  features {
    feature { key: "kw" value { bytes_list { value: [ "emmy" ] } } }
    feature { key: "dank" value { int64_list { value: [ 42 ] } } }
    feature { key: "gps" value { } }
  }
]

And arguments

example_names: ["input0", "input1"],
features: {
    "kw": VarLenFeature(tf.string),
    "dank": VarLenFeature(tf.int64),
    "gps": VarLenFeature(tf.float32),
}

Then the output is a dictionary:

{
  "kw": SparseTensor(
      indices=[[0, 0], [0, 1], [1, 0]],
      values=["knit", "big", "emmy"]
      dense_shape=[2, 2]),
  "dank": SparseTensor(
      indices=[[1, 0]],
      values=[42],
      dense_shape=[2, 1]),
  "gps": SparseTensor(
      indices=[],
      values=[],
      dense_shape=[2, 0]),
}

For dense results in two serialized Examples:

[
  features {
    feature { key: "age" value { int64_list { value: [ 0 ] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
   },
   features {
    feature { key: "age" value { int64_list { value: [] } } }
    feature { key: "gender" value { bytes_list { value: [ "f" ] } } }
  }
]

We can use arguments:

example_names: ["input0", "input1"],
features: {
    "age": FixedLenFeature([], dtype=tf.int64, default_value=-1),
    "gender": FixedLenFeature([], dtype=tf.string),
}

And the expected output is:

{
  "age": [[0], [-1]],
  "gender": [["f"], ["f"]],
}

An alternative to VarLenFeature to obtain a SparseTensor isSparseFeature. For example, given two Example input protos inserialized:

[
  features {
    feature { key: "val" value { float_list { value: [ 0.5, -1.0 ] } } }
    feature { key: "ix" value { int64_list { value: [ 3, 20 ] } } }
  },
  features {
    feature { key: "val" value { float_list { value: [ 0.0 ] } } }
    feature { key: "ix" value { int64_list { value: [ 42 ] } } }
  }
]

And arguments

example_names: ["input0", "input1"],
features: {
    "sparse": SparseFeature(
        index_key="ix", value_key="val", dtype=tf.float32, size=100),
}

Then the output is a dictionary:

{
  "sparse": SparseTensor(
      indices=[[0, 3], [0, 20], [1, 42]],
      values=[0.5, -1.0, 0.0]
      dense_shape=[2, 100]),
}

See the tf.io.RaggedFeature documentation for examples showing howRaggedFeature can be used to obtain RaggedTensors.

Args
serialized	A vector (1-D Tensor) of strings, a batch of binary serialized Example protos.
features	A mapping of feature keys to FixedLenFeature,VarLenFeature, SparseFeature, and RaggedFeature values.
example_names	A vector (1-D Tensor) of strings (optional), the names of the serialized protos in the batch.
name	A name for this operation (optional).

Returns
A dict mapping feature keys to Tensor, SparseTensor, andRaggedTensor values.

Raises
ValueError	if any feature is invalid.