tfds.download.DownloadConfig

Stay organized with collections Save and categorize content based on your preferences.

Configuration for tfds.core.DatasetBuilder.download_and_prepare.

tfds.download.DownloadConfig(
    extract_dir: Optional[epath.PathLike] = None,
    manual_dir: Optional[epath.PathLike] = None,
    download_mode: util.GenerateMode = tfds.download.DownloadConfig.download_mode,
    compute_stats: util.ComputeStatsMode = tfds.download.ComputeStatsMode.SKIP,
    max_examples_per_split: Optional[int] = None,
    register_checksums: bool = False,
    force_checksums_validation: bool = False,
    beam_runner: Optional[Any] = None,
    beam_options: Optional[Any] = None,
    try_download_gcs: bool = True,
    verify_ssl: bool = True,
    override_max_simultaneous_downloads: Optional[int] = None,
    num_shards: Optional[int] = None,
    min_shard_size: int = shard_utils.DEFAULT_MIN_SHARD_SIZE,
    max_shard_size: int = shard_utils.DEFAULT_MAX_SHARD_SIZE
)

Attributes
extract_dir	str, directory where extracted files are stored. Defaults to "/extracted".
manual_dir	str, read-only directory where manually downloaded/extracted data is stored. Defaults to <download_dir>/manual.
download_mode	tfds.GenerateMode, how to deal with downloads or data that already exists. Defaults to REUSE_DATASET_IF_EXISTS, which will reuse both downloads and data if it already exists.
compute_stats	tfds.download.ComputeStats, whether to compute statistics over the generated data. Defaults to AUTO.
max_examples_per_split	int, optional max number of examples to write into each split (used for testing). If set to 0, only execute the_split_generators (download the original data), but skip_generator_examples.
register_checksums	bool, defaults to False. If True, checksum of downloaded files are recorded.
force_checksums_validation	bool, defaults to False. If True, raises an error if an URL do not have checksums.
beam_runner	Runner to pass to beam.Pipeline, only used for datasets based on Beam for the generation.
beam_options	PipelineOptions to pass to beam.Pipeline, only used for datasets based on Beam for the generation.
try_download_gcs	bool, defaults to True. If True, prepared dataset will be downloaded from GCS, when available. If False, dataset will be downloaded and prepared from scratch.
verify_ssl	bool, defaults to True. If True, will verify certificate when downloading dataset.
override_max_simultaneous_downloads	int, optional max number of simultaneous downloads. If set, it will override dataset builder and downloader default values.
num_shards	optional number of shards that should be created. If None, then the number of shards is computed based on the total size of the dataset and the min and max shard size.
min_shard_size	optional minimum shard size in bytes. If None, 64 MB is used.
max_shard_size	optional maximum shard size in bytes. If None, 1 GiB is used.

Methods

`get_shard_config`

View source

get_shard_config() -> shard_utils.ShardConfig

`replace`

View source

replace(
    **kwargs
) -> DownloadConfig

Returns a copy with updated attributes.

Class Variables
beam_options	None
beam_runner	None
compute_stats	<ComputeStatsMode.SKIP: 'skip'>
download_mode	<GenerateMode.REUSE_DATASET_IF_EXISTS: 'reuse_dataset_if_exists'>
extract_dir	None
force_checksums_validation	False
manual_dir	None
max_examples_per_split	None
max_shard_size	1073741824
min_shard_size	67108864
num_shards	None
override_max_simultaneous_downloads	None
register_checksums	False
try_download_gcs	True
verify_ssl	True

tfds.download.DownloadConfig | TensorFlow Datasets (original) (raw)

Methods

get_shard_config

replace

`get_shard_config`

`replace`