Training Arguments — Sentence Transformers documentation (original) (raw)

CrossEncoderTrainingArguments

class sentence_transformers.cross_encoder.training_args.CrossEncoderTrainingArguments(output_dir: str | None = None, overwrite_output_dir: bool = False, do_train: bool = False, do_eval: bool = False, do_predict: bool = False, eval_strategy: ~transformers.trainer_utils.IntervalStrategy | str = 'no', prediction_loss_only: bool = False, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, per_gpu_train_batch_size: int | None = None, per_gpu_eval_batch_size: int | None = None, gradient_accumulation_steps: int = 1, eval_accumulation_steps: int | None = None, eval_delay: float | None = 0, torch_empty_cache_steps: int | None = None, learning_rate: float = 5e-05, weight_decay: float = 0.0, adam_beta1: float = 0.9, adam_beta2: float = 0.999, adam_epsilon: float = 1e-08, max_grad_norm: float = 1.0, num_train_epochs: float = 3.0, max_steps: int = -1, lr_scheduler_type: ~transformers.trainer_utils.SchedulerType | str = 'linear', lr_scheduler_kwargs: dict | str | None = , warmup_ratio: float | None = , warmup_steps: int = 0, log_level: str | None = 'passive', log_level_replica: str | None = 'warning', log_on_each_node: bool = True, logging_dir: str | None = None, logging_strategy: ~transformers.trainer_utils.IntervalStrategy | str = 'steps', logging_first_step: bool = False, logging_steps: float = 500, logging_nan_inf_filter: bool = True, save_strategy: ~transformers.trainer_utils.SaveStrategy | str = 'steps', save_steps: float = 500, save_total_limit: int | None = None, save_safetensors: bool | None = True, save_on_each_node: bool = False, save_only_model: bool = False, restore_callback_states_from_checkpoint: bool = False, no_cuda: bool = False, use_cpu: bool = False, use_mps_device: bool = False, seed: int = 42, data_seed: int | None = None, jit_mode_eval: bool = False, use_ipex: bool = False, bf16: bool = False, fp16: bool = False, fp16_opt_level: str = 'O1', half_precision_backend: str = 'auto', bf16_full_eval: bool = False, fp16_full_eval: bool = False, tf32: bool | None = None, local_rank: int = -1, ddp_backend: str | None = None, tpu_num_cores: int | None = None, tpu_metrics_debug: bool = False, debug: str | list[~transformers.debug_utils.DebugOption] = '', dataloader_drop_last: bool = False, eval_steps: float | None = None, dataloader_num_workers: int = 0, dataloader_prefetch_factor: int | None = None, past_index: int = -1, run_name: str | None = None, disable_tqdm: bool | None = None, remove_unused_columns: bool | None = True, label_names: list[str] | None = None, load_best_model_at_end: bool | None = False, metric_for_best_model: str | None = None, greater_is_better: bool | None = None, ignore_data_skip: bool = False, fsdp: list[~transformers.trainer_utils.FSDPOption] | str | None = '', fsdp_min_num_params: int = 0, fsdp_config: dict | str | None = None, tp_size: int | None = 0, fsdp_transformer_layer_cls_to_wrap: str | None = None, accelerator_config: dict | str | None = None, deepspeed: dict | str | None = None, label_smoothing_factor: float = 0.0, optim: ~transformers.training_args.OptimizerNames | str = 'adamw_torch', optim_args: str | None = None, adafactor: bool = False, group_by_length: bool = False, length_column_name: str | None = 'length', report_to: None | str | list[str] = None, ddp_find_unused_parameters: bool | None = None, ddp_bucket_cap_mb: int | None = None, ddp_broadcast_buffers: bool | None = None, dataloader_pin_memory: bool = True, dataloader_persistent_workers: bool = False, skip_memory_metrics: bool = True, use_legacy_prediction_loop: bool = False, push_to_hub: bool = False, resume_from_checkpoint: str | None = None, hub_model_id: str | None = None, hub_strategy: ~transformers.trainer_utils.HubStrategy | str = 'every_save', hub_token: str | None = None, hub_private_repo: bool | None = None, hub_always_push: bool = False, gradient_checkpointing: bool = False, gradient_checkpointing_kwargs: dict | str | None = None, include_inputs_for_metrics: bool = False, include_for_metrics: list[str] = , eval_do_concat_batches: bool = True, fp16_backend: str = 'auto', push_to_hub_model_id: str | None = None, push_to_hub_organization: str | None = None, push_to_hub_token: str | None = None, mp_parameters: str = '', auto_find_batch_size: bool = False, full_determinism: bool = False, torchdynamo: str | None = None, ray_scope: str | None = 'last', ddp_timeout: int | None = 1800, torch_compile: bool = False, torch_compile_backend: str | None = None, torch_compile_mode: str | None = None, include_tokens_per_second: bool | None = False, include_num_input_tokens_seen: bool | None = False, neftune_noise_alpha: float | None = None, optim_target_modules: None | str | list[str] = None, batch_eval_metrics: bool = False, eval_on_start: bool = False, use_liger_kernel: bool | None = False, eval_use_gather_object: bool | None = False, average_tokens_across_devices: bool | None = False, prompts: Union[str, None, dict[str, str], dict[str, dict[str, str]]] = None, batch_sampler: Union[BatchSamplers, str, DefaultBatchSampler, Callable[..., DefaultBatchSampler]] = BatchSamplers.BATCH_SAMPLER, multi_dataset_batch_sampler: Union[MultiDatasetBatchSamplers, str, MultiDatasetDefaultBatchSampler, Callable[..., MultiDatasetDefaultBatchSampler]] = MultiDatasetBatchSamplers.PROPORTIONAL, router_mapping: Union[str, None, dict[str, str], dict[str, dict[str, str]]] = , learning_rate_mapping: Union[str, None, dict[str, float]] = )[source]

CrossEncoderTrainingArguments extends BaseTrainingArgumentswith additional arguments specific to Sentence Transformers. See TrainingArguments for the complete list of available arguments.

Parameters:

property ddp_timeout_delta_: timedelta_

The actual timeout for torch.distributed.init_process_group since it expects a timedelta variable.

property device_: device_

The device used by this process.

property eval_batch_size_: int_

The actual batch size for evaluation (may differ from per_gpu_eval_batch_size in distributed training).

get_process_log_level()

Returns the log level to be used depending on whether this process is the main process of node 0, main process of node non-0, or a non-main process.

For the main process the log level defaults to the logging level set (logging.WARNING if you didn’t do anything) unless overridden by log_level argument.

For the replica processes the log level defaults to logging.WARNING unless overridden by log_level_replicaargument.

The choice between the main and replica process settings is made according to the return value of should_log.

get_warmup_steps(num_training_steps: int)

Get number of steps used for a linear warmup.

property local_process_index

The index of the local process used.

main_process_first(local=True, desc='work')

A context manager for torch distributed environment where on needs to do something on the main process, while blocking replicas, and when it’s finished releasing the replicas.

One such use is for datasets’s map feature which to be efficient should be run once on the main process, which upon completion saves a cached version of results and which then automatically gets loaded by the replicas.

Parameters:

property n_gpu

The number of GPUs used by this process.

Note

This will only be greater than one when you have multiple GPUs available but are not using distributed training. For distributed training, it will always be 1.

property parallel_mode

The current mode used for parallelism if multiple GPUs/TPU cores are available. One of:

property place_model_on_device

Can be subclassed and overridden for some specific integrations.

property process_index

The index of the current process used.

set_dataloader(train_batch_size: int = 8, eval_batch_size: int = 8, drop_last: bool = False, num_workers: int = 0, pin_memory: bool = True, persistent_workers: bool = False, prefetch_factor: int | None = None, auto_find_batch_size: bool = False, ignore_data_skip: bool = False, sampler_seed: int | None = None)

A method that regroups all arguments linked to the dataloaders creation.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_dataloader(train_batch_size=16, eval_batch_size=64) args.per_device_train_batch_size 16

set_evaluate(strategy: str | IntervalStrategy = 'no', steps: int = 500, batch_size: int = 8, accumulation_steps: int | None = None, delay: float | None = None, loss_only: bool = False, jit_mode: bool = False)

A method that regroups all arguments linked to evaluation.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_evaluate(strategy="steps", steps=100) args.eval_steps 100

set_logging(strategy: str | IntervalStrategy = 'steps', steps: int = 500, report_to: str | list[str] = 'none', level: str = 'passive', first_step: bool = False, nan_inf_filter: bool = False, on_each_node: bool = False, replica_level: str = 'passive')

A method that regroups all arguments linked to logging.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_logging(strategy="steps", steps=100) args.logging_steps 100

set_lr_scheduler(name: str | SchedulerType = 'linear', num_epochs: float = 3.0, max_steps: int = -1, warmup_ratio: float = 0, warmup_steps: int = 0)

A method that regroups all arguments linked to the learning rate scheduler and its hyperparameters.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_lr_scheduler(name="cosine", warmup_ratio=0.05) args.warmup_ratio 0.05

set_optimizer(name: str | OptimizerNames = 'adamw_torch', learning_rate: float = 5e-05, weight_decay: float = 0, beta1: float = 0.9, beta2: float = 0.999, epsilon: float = 1e-08, args: str | None = None)

A method that regroups all arguments linked to the optimizer and its hyperparameters.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_optimizer(name="adamw_torch", beta1=0.8) args.optim 'adamw_torch'

set_push_to_hub(model_id: str, strategy: str | HubStrategy = 'every_save', token: str | None = None, private_repo: bool | None = None, always_push: bool = False)

A method that regroups all arguments linked to synchronizing checkpoints with the Hub.

Tip

Calling this method will set self.push_to_hub to True, which means the output_dir will begin a git directory synced with the repo (determined by model_id) and the content will be pushed each time a save is triggered (depending on your self.save_strategy). Calling [~Trainer.save_model] will also trigger a push.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_push_to_hub("me/awesome-model") args.hub_model_id 'me/awesome-model'

set_save(strategy: str | IntervalStrategy = 'steps', steps: int = 500, total_limit: int | None = None, on_each_node: bool = False)

A method that regroups all arguments linked to checkpoint saving.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_save(strategy="steps", steps=100) args.save_steps 100

set_testing(batch_size: int = 8, loss_only: bool = False, jit_mode: bool = False)

A method that regroups all basic arguments linked to testing on a held-out dataset.

Tip

Calling this method will automatically set self.do_predict to True.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_testing(batch_size=32) args.per_device_eval_batch_size 32

set_training(learning_rate: float = 5e-05, batch_size: int = 8, weight_decay: float = 0, num_epochs: float = 3, max_steps: int = -1, gradient_accumulation_steps: int = 1, seed: int = 42, gradient_checkpointing: bool = False)

A method that regroups all basic arguments linked to the training.

Tip

Calling this method will automatically set self.do_train to True.

Parameters:

Example:

from transformers import TrainingArguments

args = TrainingArguments("working_dir") args = args.set_training(learning_rate=1e-4, batch_size=32) args.learning_rate 1e-4

property should_log

Whether or not the current process should produce log.

property should_save

Whether or not the current process should write to disk, e.g., to save models and checkpoints.

to_dict()[source]

Serializes this instance while replace Enum by their values (for JSON serialization support). It obfuscates the token values by removing their value.

to_json_string()

Serializes this instance to a JSON string.

to_sanitized_dict() → dict[str, Any]

Sanitized serialization to use with TensorBoard’s hparams

property train_batch_size_: int_

The actual batch size for training (may differ from per_gpu_train_batch_size in distributed training).

property world_size

The number of processes used in parallel.