tfds.features.FeaturesDict  |  TensorFlow Datasets (original) (raw)

Composite FeatureConnector; each feature in dict has its own connector.

Inherits From: FeatureConnector

tfds.features.FeaturesDict(
    feature_dict: Dict[str, feature_lib.FeatureConnectorArg],
    *,
    doc: feature_lib.DocArg = None
)

The encode/decode method of the spec feature will recursively encode/decode every sub-connector given on the constructor. Other features can inherit from this class and call super() in order to get nested container.

Example:

For DatasetInfo:

features = tfds.features.FeaturesDict({
    'input': tfds.features.Image(),
    'output': np.int32,
})

At generation time:

for image, label in generate_examples:
  yield {
      'input': image,
      'output': label
  }

At tf.data.Dataset() time:

for example in tfds.load(...):
  tf_input = example['input']
  tf_output = example['output']

For nested features, the FeaturesDict will internally flatten the keys for the features and the conversion to tf.train.Example. Indeed, the tf.train.Example proto do not support nested features while tf.data.Dataset does. But internal transformation should be invisible to the user.

Example:

tfds.features.FeaturesDict({
    'input': np.int32,
    'target': {
        'height': np.int32,
        'width': np.int32,
    },
})

Will internally store the data as:

{
    'input': tf.io.FixedLenFeature(shape=(), dtype=tf.int32),
    'target/height': tf.io.FixedLenFeature(shape=(), dtype=tf.int32),
    'target/width': tf.io.FixedLenFeature(shape=(), dtype=tf.int32),
}
Args
feature_dict dict Dictionary containing the feature connectors of a example. The keys should correspond to the data dict as returned by tf.data.Dataset(). Types (np.int32,...) and dicts will automatically be converted into FeatureConnector.
doc Documentation of this feature (e.g. description).
Raises
ValueError If one of the given features is not recognized
Attributes
doc
dtype Return the dtype (or dict of dtype) of this FeatureConnector.
flat_features
flat_sequence_ranks
flat_serialized_info
np_dtype
numpy_dtype
shape Return the shape (or dict of shape) of this FeatureConnector.
tf_dtype
tf_example_spec Returns the tf.Example proto structure.

Methods

catalog_documentation

View source

catalog_documentation() -> List[feature_lib.CatalogFeatureDocumentation]

Returns the feature documentation to be shown in the catalog.

cls_from_name

View source

@classmethod cls_from_name( python_class_name: str ) -> Type['FeatureConnector']

Returns the feature class for the given Python class.

decode_batch_example

View source

decode_batch_example(
    tfexample_data
)

Decode multiple features batched in a single tf.Tensor.

This function is used to decode features wrapped intfds.features.Sequence(). By default, this function apply decode_example on each individual elements using tf.map_fn. However, for optimization, features can overwrite this method to apply a custom batch decoding.

Args
tfexample_data Same tf.Tensor inputs as decode_example, but with and additional first dimension for the sequence length.
Returns
tensor_data Tensor or dictionary of tensor, output of the tf.data.Dataset object

decode_example

View source

decode_example(
    serialized_example, *, decoders=None
)

Decode the feature dict to TF compatible input.

Args
tfexample_data Data or dictionary of data, as read by the tf-example reader. It correspond to the tf.Tensor() (or dict of tf.Tensor()) extracted from the tf.train.Example, matching the info defined inget_serialized_info().
Returns
tensor_data Tensor or dictionary of tensor, output of the tf.data.Dataset object

decode_example_np

View source

decode_example_np(
    serialized_example, *, decoders=None
)

Encode the feature dict into NumPy-compatible input.

Args
example_data Value to convert to NumPy.
Returns
np_data Data as NumPy-compatible type: either a Python primitive (bytes, int, etc) or a NumPy array.

decode_ragged_example

View source

decode_ragged_example(
    tfexample_data
)

Decode nested features from a tf.RaggedTensor.

This function is used to decode features wrapped in nestedtfds.features.Sequence(). By default, this function apply decode_batch_example on the flat values of the ragged tensor. For optimization, features can overwrite this method to apply a custom batch decoding.

Args
tfexample_data tf.RaggedTensor inputs containing the nested encoded examples.
Returns
tensor_data The decoded tf.RaggedTensor or dictionary of tensor, output of the tf.data.Dataset object

deserialize_example

View source

deserialize_example(
    serialized_example: Union[tf.Tensor, bytes], *, decoders=None
) -> utils.TensorDict

Decodes the tf.train.Example data into tf.Tensor.

See serialize_example to encode the data into proto.

Args
serialized_example The tensor-like object containing the serializedtf.train.Example proto.
decoders Eventual decoders to apply (seedocumentation)
Returns
The decoded features tensors.

deserialize_example_np

View source

deserialize_example_np(
    serialized_example: Union[tf.Tensor, bytes], *, decoders=None
) -> utils.NpArrayOrScalarDict

encode_example

View source

encode_example(
    example_dict
)

See base class for details.

from_config

View source

@classmethod from_config( root_dir: str ) -> FeatureConnector

Reconstructs the FeatureConnector from the config file.

Usage:

features = FeatureConnector.from_config('path/to/dir')
Args
root_dir Directory containing the features.json file.
Returns
The reconstructed feature instance.

from_json

View source

@classmethod from_json( value: Json ) -> FeatureConnector

FeatureConnector factory.

This function should be called from the tfds.features.FeatureConnectorbase class. Subclass should implement the from_json_content.

Example:

feature = tfds.features.FeatureConnector.from_json(
    {'type': 'Image', 'content': {'shape': [32, 32, 3], 'dtype': 'uint8'} }
)
assert isinstance(feature, tfds.features.Image)
Args
value dict(type=, content=) containing the feature to restore. Match dict returned by to_json.
Returns
The reconstructed FeatureConnector.

from_json_content

View source

@classmethod from_json_content( value: Union[Json, feature_pb2.FeaturesDict] ) -> 'FeaturesDict'

FeatureConnector factory (to overwrite).

Subclasses should overwrite this method. This method is used when importing the feature connector from the config.

This function should not be called directly. FeatureConnector.from_jsonshould be called instead.

See existing FeatureConnectors for implementation examples.

Args
value FeatureConnector information represented as either Json or a Feature proto. The content must match what is returned byto_json_content.
doc Documentation of this feature (e.g. description).
Returns
The reconstructed FeatureConnector.

from_proto

View source

@classmethod from_proto( feature_proto: feature_pb2.Feature ) -> T

Instantiates a feature from its proto representation.

get_serialized_info

View source

get_serialized_info()

See base class for details.

get_tensor_info

View source

get_tensor_info()

See base class for details.

get_tensor_spec

View source

get_tensor_spec() -> TreeDict[tf.TensorSpec]

Returns the tf.TensorSpec of this feature (not the element spec!).

Note that the output of this method may not correspond to the element spec of the dataset. For example, currently this method does not support RaggedTensorSpec.

items

View source

items()

keys

View source

keys()

load_metadata

View source

load_metadata(
    data_dir, feature_name=None
)

See base class for details.

repr_html

View source

repr_html(
    ex: np.ndarray
) -> str

Returns the HTML str representation of the object.

repr_html_batch

View source

repr_html_batch(
    ex: np.ndarray
) -> str

Returns the HTML str representation of the object (Sequence).

repr_html_ragged

View source

repr_html_ragged(
    ex: np.ndarray
) -> str

Returns the HTML str representation of the object (Nested sequence).

save_config

View source

save_config(
    root_dir: str
) -> None

Exports the FeatureConnector to a file.

Args
root_dir path/to/dir containing the features.json

save_metadata

View source

save_metadata(
    data_dir, feature_name=None
)

See base class for details.

serialize_example

View source

serialize_example(
    example_data
) -> bytes

Encodes nested data values into tf.train.Example bytes.

See deserialize_example to decode the proto into tf.Tensor.

Args
example_data Example data to encode (numpy-like nested dict)
Returns
The serialized tf.train.Example.

to_json

View source

to_json() -> Json

Exports the FeatureConnector to Json.

Each feature is serialized as a dict(type=..., content=...).

For example:

tfds.features.FeaturesDict({
    'input': tfds.features.Image(),
    'target': tfds.features.ClassLabel(num_classes=10),
})

Is serialized as:

{
    "type": "tensorflow_datasets.core.features.features_dict.FeaturesDict",
    "content": {
        "input": {
            "type": "tensorflow_datasets.core.features.image_feature.Image",
            "content": {
                "shape": [null, null, 3],
                "dtype": "uint8",
                "encoding_format": "png"
            }
        },
        "target": {
            "type":
            "tensorflow_datasets.core.features.class_label_feature.ClassLabel",
            "content": {
              "num_classes": 10
            }
        }
    }
}
Returns
A dict(type=, content=). Will be forwarded to from_json when reconstructing the feature.

to_json_content

View source

to_json_content() -> feature_pb2.FeaturesDict

FeatureConnector factory (to overwrite).

This function should be overwritten by the subclass to allow re-importing the feature connector from the config. See existing FeatureConnector for example of implementation.

Returns
The FeatureConnector metadata in either a dict, or a Feature proto. This output is used in from_json_content when reconstructing the feature.

to_proto

View source

to_proto() -> feature_pb2.Feature

Exports the FeatureConnector to the Feature proto.

For features that have a specific schema defined in a proto, this function needs to be overriden. If there's no specific proto schema, then the feature will be represented using JSON.

Returns
The feature proto describing this feature.

values

View source

values()

__contains__

View source

__contains__(
    k
)

__getitem__

View source

__getitem__(
    key
)

Return the feature associated with the key.

__iter__

View source

__iter__()

__len__

View source

__len__()
Class Variables
ALIASES []