Trainer — Sentence Transformers documentation (original) (raw)

SparseEncoderTrainer

class sentence_transformers.sparse_encoder.trainer.SparseEncoderTrainer(model: SparseEncoder | None = None, args: SparseEncoderTrainingArguments | None = None, train_dataset: Dataset | DatasetDict | IterableDataset | dict[str, Dataset] | None = None, eval_dataset: Dataset | DatasetDict | IterableDataset | dict[str, Dataset] | None = None, loss: Module | dict[str, Module] | Callable[[SparseEncoder], Module] | dict[str, Callable[[SparseEncoder], Module]] | None = None, evaluator: BaseEvaluator | list[BaseEvaluator] | None = None, data_collator: SparseEncoderDataCollator | None = None, processing_class: PreTrainedTokenizerBase | BaseImageProcessor | FeatureExtractionMixin | ProcessorMixin | None = None, model_init: Callable[[], SparseEncoder] | None = None, compute_metrics: Callable[[EvalPrediction], dict] | None = None, callbacks: list[TrainerCallback] | None = None, optimizers: tuple[Optimizer, LambdaLR] = (None, None), optimizer_cls_and_kwargs: tuple[type[Optimizer], dict[str, Any]] | None = None, preprocess_logits_for_metrics: Callable[[Tensor, Tensor], Tensor] | None = None)[source]

SparseEncoderTrainer is a simple but feature-complete training and eval loop for PyTorch based on the SentenceTransformerTrainer that based on 🤗 Transformers Trainer.

This trainer integrates support for various transformers.TrainerCallback subclasses, such as:

See the Transformers Callbacksdocumentation for more information on the integrated callbacks and how to write your own callbacks.

Parameters:

Important attributes:

add_callback(callback)

Add a callback to the current list of transformers.TrainerCallback.

Parameters:

callback (type or transformers.TrainerCallback) – A transformers.TrainerCallback class or an instance of a transformers.TrainerCallback. In the first case, will instantiate a member of that class.

static add_dataset_name_transform(batch: dict[str, list[Any]], dataset_name: str | None = None, transform: Callable[[dict[str, list[Any]]], dict[str, list[Any]]] | None = None, **kwargs) → dict[str, list[Any]][source]

A transform/map function that adds the dataset name to the batch.

Parameters:

Returns:

The “just-in-time” transformed batch with the dataset name added.

Return type:

dict[str, list[Any]]

add_model_card_callback(default_args_dict: dict[str, Any]) → None[source]

Add a callback responsible for automatically tracking data required for the automatic model card generation

This method is called in the __init__ method of the trainer subclass.

Parameters:

default_args_dict (Dict [ str , Any ]) – A dictionary of the default training arguments, so we can determine which arguments have been changed for the model card.

Note

This method can be overridden by subclassing the trainer to remove/customize this callback in custom uses cases

compute_loss(model: BaseModel, inputs: dict[str, Tensor | Any], return_outputs: bool = False, num_items_in_batch=None) → Tensor | tuple[Tensor, dict[str, Any]][source]

Computes the loss for the BaseModel model.

It uses self.loss to compute the loss, which can be a single loss function or a dictionary of loss functions for different datasets. If the loss is a dictionary, the dataset name is expected to be passed in the inputs under the key “dataset_name”. This is done automatically in the add_dataset_name_column method. Note that even if return_outputs = True, the outputs will be empty, as the BaseModel losses do not return outputs.

Parameters:

Returns:

The computed loss. If return_outputs is True, returns a tuple of loss and outputs. Otherwise, returns only the loss.

Return type:

Union[torch.Tensor, Tuple[torch.Tensor, Dict[str, Any]]]

create_model_card(language: str | None = None, license: str | None = None, tags: str | list[str] | None = None, model_name: str | None = None, finetuned_from: str | None = None, tasks: str | list[str] | None = None, dataset_tags: str | list[str] | None = None, dataset: str | list[str] | None = None, dataset_args: str | list[str] | None = None, **kwargs) → None[source]

Creates a draft of a model card using the information available to the Trainer.

Parameters:

create_optimizer()

Setup the optimizer.

We provide a reasonable default that works well. If you want to use something else, you can pass a tuple in the Trainer’s init through optimizers, or subclass and override this method in a subclass.

create_optimizer_and_scheduler(num_training_steps: int)

Setup the optimizer and the learning rate scheduler.

We provide a reasonable default that works well. If you want to use something else, you can pass a tuple in the Trainer’s init through optimizers, or subclass and override this method (or create_optimizer and/orcreate_scheduler) in a subclass.

create_scheduler(num_training_steps: int, optimizer: Optimizer | None = None)

Setup the scheduler. The optimizer of the trainer must have been set up either before this method is called or passed as an argument.

Parameters:

num_training_steps (int) – The number of training steps to do.

data_collator_class[source]

alias of SparseEncoderDataCollator

evaluate(eval_dataset: Dataset | dict[str, Dataset] | None = None, ignore_keys: list[str] | None = None, metric_key_prefix: str = 'eval') → dict[str, float][source]

Run evaluation and returns metrics.

The calling script will be responsible for providing a method to compute metrics, as they are task-dependent (pass it to the init compute_metrics argument).

You can also subclass and override this method to inject custom behavior.

Parameters:

Returns:

A dictionary containing the evaluation loss and the potential metrics computed from the predictions. The dictionary also contains the epoch number which comes from the training state.

get_batch_sampler(dataset: Dataset, batch_size: int, drop_last: bool, valid_label_columns: list[str] | None = None, generator: Generator | None = None, seed: int = 0) → BatchSampler | None[source]

Returns the appropriate batch sampler based on the batch_sampler argument in self.args. This batch sampler class supports __len__ and __iter__ methods, and is used as the batch_samplerto create the torch.utils.data.DataLoader.

Note

Override this method to provide a custom batch sampler.

Parameters:

get_data_collator(model: BaseModel, args: BaseTrainingArguments, processing_class: PreTrainedTokenizerBase | BaseImageProcessor | FeatureExtractionMixin | ProcessorMixin | None = None) → BaseDataCollator[source]

Load the data collator for the trainer.

Parameters:

Returns:

The data collator to use for the trainer

Return type:

BaseDataCollator

Note

This method can be overridden by subclassing the trainer to use a custom data collator.

get_eval_dataloader(eval_dataset: Dataset | DatasetDict | IterableDataset | None = None) → DataLoader[source]

Returns the evaluation [~torch.utils.data.DataLoader].

Subclass and override this method if you want to inject some custom behavior.

Parameters:

eval_dataset (torch.utils.data.Dataset, optional) – If provided, will override self.eval_dataset. If it is a [~datasets.Dataset], columns not accepted by the model.forward() method are automatically removed. It must implement __len__.

get_learning_rates()

Returns the learning rate of each parameter from self.optimizer.

get_multi_dataset_batch_sampler(dataset: ConcatDataset, batch_samplers: list[BatchSampler], generator: Generator | None = None, seed: int | None = 0) → BatchSampler[source]

Returns the appropriate multi-dataset batch sampler based on the multi_dataset_batch_sampler argument in self.args. This batch sampler class supports __len__ and __iter__ methods, and is used as thebatch_sampler to create the torch.utils.data.DataLoader.

Note

Override this method to provide a custom multi-dataset batch sampler.

Parameters:

get_num_trainable_parameters()

Get the number of trainable parameters.

get_optimizer_group(param: str | Parameter | None = None)

Returns optimizer group for a parameter if given, else returns all optimizer groups for params.

Parameters:

param (str or torch.nn.parameter.Parameter, optional) – The parameter for which optimizer group needs to be returned.

get_test_dataloader(test_dataset: Dataset | DatasetDict | IterableDataset) → DataLoader[source]

Returns the test [~torch.utils.data.DataLoader].

Subclass and override this method if you want to inject some custom behavior.

Parameters:

test_dataset (torch.utils.data.Dataset, optional) – The test dataset to use. If it is a [~datasets.Dataset], columns not accepted by themodel.forward() method are automatically removed. It must implement __len__.

get_train_dataloader() → DataLoader[source]

Returns the training [~torch.utils.data.DataLoader].

Will use no sampler if train_dataset does not implement __len__, a random sampler (adapted to distributed training if necessary) otherwise.

Subclass and override this method if you want to inject some custom behavior.

hyperparameter_search(hp_space: Callable[[optuna.Trial], dict[str, float]] | None = None, compute_objective: Callable[[dict[str, float]], float] | None = None, n_trials: int = 20, direction: str | list[str] = 'minimize', backend: str | HPSearchBackend | None = None, hp_name: Callable[[optuna.Trial], str] | None = None, **kwargs) → BestRun | list[BestRun]

Launch an hyperparameter search using optuna or Ray Tune or SigOpt. The optimized quantity is determined by compute_objective, which defaults to a function returning the evaluation loss when no metric is provided, the sum of all metrics otherwise.

To use this method, you need to have provided a model_init when initializing your [Trainer]: we need to reinitialize the model at each new run. This is incompatible with the optimizers argument, so you need to subclass [Trainer] and override the method [~Trainer.create_optimizer_and_scheduler] for custom optimizer/scheduler.

Parameters:

Returns:

All the information about the best run or best runs for multi-objective optimization. Experiment summary can be found in run_summary attribute for Ray backend.

Return type:

[trainer_utils.BestRun or List[trainer_utils.BestRun]]

is_local_process_zero() → bool

Whether or not this process is the local (e.g., on one machine if training in a distributed fashion on several machines) main process.

is_world_process_zero() → bool

Whether or not this process is the global main process (when training in a distributed fashion on several machines, this is only going to be True for one process).

log(logs: dict[str, float], start_time: float | None = None) → None[source]

Log logs on the various objects watching training.

Subclass and override this method to inject custom behavior.

Parameters:

model_card_callback_class[source]

alias of SparseEncoderModelCardCallback

model_card_data_class[source]

alias of SparseEncoderModelCardData

model_class[source]

alias of SparseEncoder

pop_callback(callback)

Remove a callback from the current list of transformers.TrainerCallback and returns it.

If the callback is not found, returns None (and no error is raised).

Parameters:

callback (type or transformers.TrainerCallback) – A transformers.TrainerCallback class or an instance of a transformers.TrainerCallback. In the first case, will pop the first member of that class found in the list of callbacks.

Returns:

The callback removed, if found.

Return type:

transformers.TrainerCallback

preprocess_dataset(dataset: DatasetDict | Dataset | None = None, dataset_name: str | None = None) → DatasetDict | Dataset | None[source]

Preprocess the dataset by optionally lazily adding a dataset name column, required for multi-dataset training with multiple losses or for dataset-specific router mappings.

Parameters:

Returns:

The preprocessed dataset, perhaps with dataset names added as a lazy column.

Return type:

DatasetDict | Dataset | None

propagate_args_to_deepspeed(auto_find_batch_size=False)

Note

This docstring is inherited from transformers.Trainer.propagate_args_to_deepspeed.

Sets values in the deepspeed plugin based on the Trainer args

push_to_hub(commit_message: str | None = 'End of training', blocking: bool = True, token: str | None = None, revision: str | None = None, **kwargs) → str

Upload self.model and self.processing_class to the 🤗 model hub on the repo self.args.hub_model_id.

Parameters:

Returns:

The URL of the repository where the model was pushed if blocking=False, or a Future object tracking the progress of the commit if blocking=True.

remove_callback(callback)

Remove a callback from the current list of transformers.TrainerCallback.

Parameters:

callback (type or transformers.TrainerCallback) – A transformers.TrainerCallback class or an instance of a transformers.TrainerCallback. In the first case, will remove the first member of that class found in the list of callbacks.

save_model(output_dir: str | None = None, _internal_call: bool = False)

Will save the model, so you can reload it using from_pretrained().

Will only save from the main process.

set_initial_training_values(args: TrainingArguments, dataloader: DataLoader, total_train_batch_size: int)

Calculates and returns the following values: - num_train_epochs- num_update_steps_per_epoch- num_examples- num_train_samples- epoch_based- len_dataloader- max_steps

should_dataset_name_column_be_added(dataset: DatasetDict | Dataset | None, args: BaseTrainingArguments, loss: Module | dict[str, Module]) → bool[source]

We should add a dataset name column to the dataset, if the dataset is a DatasetDict, and one of:

  1. The loss is a dictionary, or
  2. The prompts contain a mapping of dataset names, or
  3. The router_mapping contains a mapping of dataset names.

train(resume_from_checkpoint: bool | str | None = None, trial: optuna.Trial | dict[str, Any] | None = None, ignore_keys_for_eval: list[str] | None = None, **kwargs)

Main training entry point.

Parameters:

training_args_class[source]

alias of SparseEncoderTrainingArguments