tfds.core.DatasetInfo | TensorFlow Datasets (original) (raw)
Information about a dataset.
tfds.core.DatasetInfo(
*,
builder: Union[DatasetIdentity, Any],
description: Optional[str] = None,
features: Optional[feature_lib.FeatureConnector] = None,
supervised_keys: Optional[SupervisedKeysType] = None,
disable_shuffling: bool = False,
homepage: Optional[str] = None,
citation: Optional[str] = None,
metadata: Optional[Metadata] = None,
license: Optional[str] = None,
redistribution_info: Optional[Dict[str, str]] = None,
split_dict: Optional[splits_lib.SplitDict] = None
)
DatasetInfo
documents datasets, including its name, version, and features. See the constructor arguments and properties for a full list.
Args | |
---|---|
builder | DatasetBuilder or DatasetIdentity. The dataset builder or identity will be used to populate this info. |
description | str, description of this dataset. |
features | tfds.features.FeaturesDict, Information on the feature dict of the tf.data.Dataset() object from the builder.as_dataset() method. |
supervised_keys | Specifies the input structure for supervised learning, if applicable for the dataset, used with "as_supervised". The keys correspond to the feature names to select in info.features. When calling tfds.core.DatasetBuilder.as_dataset() withas_supervised=True, the tf.data.Dataset object will yield the structure defined by the keys passed here, instead of that defined by the features argument. Typically this is a (input_key, target_key)tuple, and the dataset yields a tuple of tensors (input, target)tensors.To yield a more complex structure, pass a tuple of tf.nest compatible structures of feature keys. The resulting Dataset will yield structures with each key replaced by the coresponding tensor. For example, passing a triple of keys would return a dataset that yields (feature, target, sample_weights) triples for keras. Using supervised_keys=({'a':'a','b':'b'}, 'c') would create a dataset yielding a tuple with a dictionary of features in the featuresposition. Note that selecting features in nested tfds.features.FeaturesDictobjects is not supported. |
disable_shuffling | bool, specify whether to shuffle the examples. |
homepage | str, optional, the homepage for this dataset. |
citation | str, optional, the citation to use for this dataset. |
metadata | tfds.core.Metadata, additonal object which will be stored/restored with the dataset. This allows for storing additional information with the dataset. |
license | license of the dataset. |
redistribution_info | information needed for redistribution, as specified in dataset_info_pb2.RedistributionInfo. The content of the licensesubfield will automatically be written to a LICENSE file stored with the dataset. |
split_dict | information about the splits in this dataset. |
Attributes | |
---|---|
as_json | |
as_proto | |
as_proto_with_features | |
citation | |
config_description | |
config_name | |
config_tags | |
data_dir | |
dataset_size | Generated dataset files size, in bytes. |
description | |
disable_shuffling | |
download_size | Downloaded files size, in bytes. |
features | |
file_format | |
full_name | Full canonical name: (//). |
homepage | |
initialized | Whether DatasetInfo has been fully initialized. |
metadata | |
module_name | |
name | |
redistribution_info | |
release_notes | |
splits | |
supervised_keys | |
version |
Methods
add_file_data_source_access
add_file_data_source_access(
path: Union[epath.PathLike, Iterable[epath.PathLike]],
url: Optional[str] = None
) -> None
Records that the given query was used to generate this dataset.
Arguments | |
---|---|
path | path or paths of files that were read. Can be a file pattern. Multiple paths or patterns can be specified as a comma-separated string or a list. |
url | URL referring to the data being used. |
add_sql_data_source_access
add_sql_data_source_access(
sql_query: str
) -> None
Records that the given query was used to generate this dataset.
add_tfds_data_source_access
add_tfds_data_source_access(
dataset_reference: naming.DatasetReference, url: Optional[str] = None
) -> None
Records that the given query was used to generate this dataset.
Args | |
---|---|
dataset_reference | |
url | a URL referring to the TFDS dataset. |
add_url_access
add_url_access(
url: str, checksum: Optional[str] = None
) -> None
Records the URL used to generate this dataset.
from_proto
@classmethod
from_proto( builder, proto: dataset_info_pb2.DatasetInfo ) -> 'DatasetInfo'
Instantiates DatasetInfo from the given builder and proto.
initialize_from_bucket
initialize_from_bucket() -> None
Initialize DatasetInfo from GCS bucket info files.
read_from_directory
read_from_directory(
dataset_info_dir: epath.PathLike
) -> None
Update DatasetInfo from the metadata files in dataset_info_dir
.
This function updates all the dynamically generated fields (num_examples, hash, time of creation,...) of the DatasetInfo.
This will overwrite all previous metadata.
Args | |
---|---|
dataset_info_dir | The directory containing the metadata file. This should be the root directory of a specific dataset version. |
Raises | |
---|---|
FileNotFoundError | If the dataset_info.json can't be found. |
set_file_format
set_file_format(
file_format: Union[None, str, file_adapters.FileFormat],
override: bool = False
) -> None
Internal function to define the file format.
The file format is set during FileReaderBuilder.__init__
, not DatasetInfo.init.
Args | |
---|---|
file_format | The file format. |
override | Whether the file format should be overridden if it is already set. |
Raises | |
---|---|
ValueError | if the file format was already set and the overrideparameter was False. |
RuntimeError | if an incorrect combination of options is given, e.g.override=True when the DatasetInfo is already fully initialized. |
set_splits
set_splits(
split_dict: splits_lib.SplitDict
) -> None
Split setter (private method).
update_data_dir
update_data_dir(
data_dir: str
) -> None
Updates the data dir for each split.
write_to_directory
write_to_directory(
dataset_info_dir: epath.PathLike, all_metadata=True
) -> None
Write DatasetInfo
as JSON to dataset_info_dir
+ labels & features.
Args | |
---|---|
dataset_info_dir | path to directory in which to save thedataset_info.json file, as well as features.json and *.labels.txtif applicable. |
all_metadata | defaults to True. If False, will not write metadata which may have an impact on how the data is read (features.json). Should be set to True whenever write_to_directory is called for the first time for a new dataset. |