tfds.load | TensorFlow Datasets (original) (raw)
tfds
- Overview
- ReadConfig
- Split
- as_dataframe
- as_numpy
- benchmark
- builder
- builder_cls
- builder_from_directories
- builder_from_directory
- data_source
- dataset_collection
- disable_progress_bar
- display_progress_bar
- enable_progress_bar
- even_splits
- is_dataset_on_gcs
- list_builders
- list_dataset_collections
- load
- split_for_jax_process
tfds.core
- Overview
- BeamBasedBuilder
- BeamMetadataDict
- BenchmarkResult
- BuilderConfig
- DatasetBuilder
- DatasetCollectionLoader
- DatasetIdentity
- DatasetInfo
- DatasetNotFoundError
- Experiment
- FileFormat
- GeneratorBasedBuilder
- Metadata
- MetadataDict
- Path
- ReadInstruction
- SequentialWriter
- ShardedFileTemplate
- SplitDict
- SplitGenerator
- SplitInfo
- Version
- add_data_dir
- as_path
- gcs_path
- lazy_imports
- tfds_path
tfds.deprecated
tfds.features
tfds.testing
- Overview
- DatasetBuilderTestCase
- DatasetBuilderTestCase.failureException
- DummyBeamDataset
- DummyDataset
- DummyDatasetCollection
- DummyDatasetSharedGenerator
- DummyMnist
- DummyParser
- DummySerializer
- FeatureExpectationItem
- FeatureExpectationsTestCase
- MockFs
- MockPolicy
- PickableDataSourceMock
- RaggedConstant
- SubTestCase
- TestCase
- assert_features_equal
- fake_examples_dir
- make_tmp_dir
- mock_data
- mock_kaggle_api
- rm_tmp_dir
- run_in_graph_and_eager_modes
- test_main
- tmp_dir
tfds.transform
tfds.typing
tfds.load
Stay organized with collections Save and categorize content based on your preferences.
Loads the named dataset into a tf.data.Dataset.
tfds.load(
name: str,
*,
split: Optional[Tree[splits_lib.SplitArg]] = None,
data_dir: Union[None, str, os.PathLike] = None,
batch_size: Optional[int] = None,
shuffle_files: bool = False,
download: bool = True,
as_supervised: bool = False,
decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None,
read_config: Optional[read_config_lib.ReadConfig] = None,
with_info: bool = False,
builder_kwargs: Optional[Dict[str, Any]] = None,
download_and_prepare_kwargs: Optional[Dict[str, Any]] = None,
as_dataset_kwargs: Optional[Dict[str, Any]] = None,
try_gcs: bool = False
)
Used in the notebooks
tfds.load is a convenience method that:
- Fetch the tfds.core.DatasetBuilder by name:
builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
- Generate the data (when
download=True
):
builder.download_and_prepare(**download_and_prepare_kwargs)
- Load the tf.data.Dataset object:
ds = builder.as_dataset(
split=split,
as_supervised=as_supervised,
shuffle_files=shuffle_files,
read_config=read_config,
decoders=decoders,
**as_dataset_kwargs,
)
See: https://www.tensorflow.org/datasets/overview#load_a_dataset for more examples.
If you'd like NumPy arrays instead of tf.data.Datasets or tf.Tensors, you can pass the return value to tfds.as_numpy.
Args | |
---|---|
name | str, the registered name of the DatasetBuilder (the snake case version of the class name). The config and version can also be specified in the name as follows: 'dataset_name[/config_name][:version]'. For example, 'movielens/25m-ratings' (for the latest version of'25m-ratings'), 'movielens:0.1.0' (for the default config and version 0.1.0), or'movielens/25m-ratings:0.1.0'. Note that only the latest version can be generated, but old versions can be read if they are present on disk. For convenience, the name parameter can contain comma-separated keyword arguments for the builder. For example, 'foo_bar/a=True,b=3'would use the FooBar dataset passing the keyword arguments a=True andb=3 (for builders with configs, it would be 'foo_bar/zoo/a=True,b=3'to use the 'zoo' config and pass to the builder keyword argumentsa=True and b=3). |
split | Which split of the data to load (e.g. 'train', 'test', ['train', 'test'], 'train[80%:]',...). See our split API guide. If None, will return all splits in a Dict[Split, tf.data.Dataset] |
data_dir | directory to read/write data. Defaults to the value of the environment variable TFDS_DATA_DIR, if set, otherwise falls back to datasets are stored. |
batch_size | int, if set, add a batch dimension to examples. Note that variable length features will be 0-padded. If batch_size=-1, will return the full dataset as tf.Tensors. |
shuffle_files | bool, whether to shuffle the input files. Defaults toFalse. |
download | bool (optional), whether to calltfds.core.DatasetBuilder.download_and_prepare before callingtfds.core.DatasetBuilder.as_dataset. If False, data is expected to be in data_dir. If True and the data is already in data_dir, when data_dir is a Placer path. |
as_supervised | bool, if True, the returned tf.data.Dataset will have a 2-tuple structure (input, label) according tobuilder.info.supervised_keys. If False, the default, the returnedtf.data.Dataset will have a dictionary with all the features. |
decoders | Nested dict of Decoder objects which allow to customize the decoding. The structure should match the feature structure, but only customized feature keys need to be present. See the guidefor more info. |
read_config | tfds.ReadConfig, Additional options to configure the input pipeline (e.g. seed, num parallel reads,...). |
with_info | bool, if True, tfds.load will return the tuple (tf.data.Dataset, tfds.core.DatasetInfo), the latter containing the info associated with the builder. |
builder_kwargs | dict (optional), keyword arguments to be passed to thetfds.core.DatasetBuilder constructor. data_dir will be passed through by default. |
download_and_prepare_kwargs | dict (optional) keyword arguments passed totfds.core.DatasetBuilder.download_and_prepare if download=True. Allow to control where to download and extract the cached data. If not set, cache_dir and manual_dir will automatically be deduced from data_dir. |
as_dataset_kwargs | dict (optional), keyword arguments passed totfds.core.DatasetBuilder.as_dataset. |
try_gcs | bool, if True, tfds.load will see if the dataset exists on the public GCS bucket before building it locally. This is equivalent to passing data_dir='gs://tfds-data/datasets'. Warning: try_gcs is different than builder_kwargs.download_config.try_download_gcs.try_gcs (default: False) overrides data_dir to be the public GCS bucket. try_download_gcs (default: True) allows downloading from GCS while keeping a different data_dir than the public GCS bucket. So, to fully bypass GCS, please use try_gcs=False anddownload_and_prepare_kwargs={'download_config': tfds.core.download.DownloadConfig(try_download_gcs=False)}). |
Returns | |
---|---|
ds | tf.data.Dataset, the dataset requested, or if split is None, adict<key: tfds.Split, value: tf.data.Dataset>. If batch_size=-1, these will be full datasets as tf.Tensors. |
ds_info | tfds.core.DatasetInfo, if with_info is True, then tfds.loadwill return a tuple (ds, ds_info) containing dataset information (version, features, splits, num_examples,...). Note that the ds_infoobject documents the entire dataset, regardless of the split requested. Split-specific information is available in ds_info.splits. |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2024-04-26 UTC.