tfds.download.DownloadConfig | TensorFlow Datasets (original) (raw)
tfds.download.DownloadConfig
Stay organized with collections Save and categorize content based on your preferences.
Configuration for tfds.core.DatasetBuilder.download_and_prepare.
tfds.download.DownloadConfig(
extract_dir: Optional[epath.PathLike] = None,
manual_dir: Optional[epath.PathLike] = None,
download_mode: util.GenerateMode = tfds.download.DownloadConfig.download_mode,
compute_stats: util.ComputeStatsMode = tfds.download.ComputeStatsMode.SKIP,
max_examples_per_split: Optional[int] = None,
register_checksums: bool = False,
force_checksums_validation: bool = False,
beam_runner: Optional[Any] = None,
beam_options: Optional[Any] = None,
try_download_gcs: bool = True,
verify_ssl: bool = True,
override_max_simultaneous_downloads: Optional[int] = None,
num_shards: Optional[int] = None,
min_shard_size: int = shard_utils.DEFAULT_MIN_SHARD_SIZE,
max_shard_size: int = shard_utils.DEFAULT_MAX_SHARD_SIZE
)
Attributes | |
---|---|
extract_dir | str, directory where extracted files are stored. Defaults to "/extracted". |
manual_dir | str, read-only directory where manually downloaded/extracted data is stored. Defaults to <download_dir>/manual. |
download_mode | tfds.GenerateMode, how to deal with downloads or data that already exists. Defaults to REUSE_DATASET_IF_EXISTS, which will reuse both downloads and data if it already exists. |
compute_stats | tfds.download.ComputeStats, whether to compute statistics over the generated data. Defaults to AUTO. |
max_examples_per_split | int, optional max number of examples to write into each split (used for testing). If set to 0, only execute the_split_generators (download the original data), but skip_generator_examples. |
register_checksums | bool, defaults to False. If True, checksum of downloaded files are recorded. |
force_checksums_validation | bool, defaults to False. If True, raises an error if an URL do not have checksums. |
beam_runner | Runner to pass to beam.Pipeline, only used for datasets based on Beam for the generation. |
beam_options | PipelineOptions to pass to beam.Pipeline, only used for datasets based on Beam for the generation. |
try_download_gcs | bool, defaults to True. If True, prepared dataset will be downloaded from GCS, when available. If False, dataset will be downloaded and prepared from scratch. |
verify_ssl | bool, defaults to True. If True, will verify certificate when downloading dataset. |
override_max_simultaneous_downloads | int, optional max number of simultaneous downloads. If set, it will override dataset builder and downloader default values. |
num_shards | optional number of shards that should be created. If None, then the number of shards is computed based on the total size of the dataset and the min and max shard size. |
min_shard_size | optional minimum shard size in bytes. If None, 64 MB is used. |
max_shard_size | optional maximum shard size in bytes. If None, 1 GiB is used. |
Methods
get_shard_config
get_shard_config() -> shard_utils.ShardConfig
replace
replace(
**kwargs
) -> DownloadConfig
Returns a copy with updated attributes.
Class Variables | |
---|---|
beam_options | None |
beam_runner | None |
compute_stats | <ComputeStatsMode.SKIP: 'skip'> |
download_mode | <GenerateMode.REUSE_DATASET_IF_EXISTS: 'reuse_dataset_if_exists'> |
extract_dir | None |
force_checksums_validation | False |
manual_dir | None |
max_examples_per_split | None |
max_shard_size | 1073741824 |
min_shard_size | 67108864 |
num_shards | None |
override_max_simultaneous_downloads | None |
register_checksums | False |
try_download_gcs | True |
verify_ssl | True |