ray.tune.Tuner.restore — Ray 2.45.0 (original) (raw)

classmethod Tuner.restore(path: str, trainable: str | Callable | Type[Trainable] | BaseTrainer, resume_unfinished: bool = True, resume_errored: bool = False, restart_errored: bool = False, param_space: Dict[str, Any] | None = None, storage_filesystem: pyarrow.fs.FileSystem | None = None, _resume_config: ResumeConfig | None = None) → Tuner [source]#

Restores Tuner after a previously failed run.

All trials from the existing run will be added to the result table. The argument flags control how existing but unfinished or errored trials are resumed.

Finished trials are always added to the overview table. They will not be resumed.

Unfinished trials can be controlled with the resume_unfinished flag. If True (default), they will be continued. If False, they will be added as terminated trials (even if they were only created and never trained).

Errored trials can be controlled with the resume_errored andrestart_errored flags. The former will resume errored trials from their latest checkpoints. The latter will restart errored trials from scratch and prevent loading their last checkpoints.

Note

Restoring an experiment from a path that’s pointing to a _different_location than the original experiment path is supported. However, Ray Tune assumes that the full experiment directory is available (including checkpoints) so that it’s possible to resume trials from their latest state.

For example, if the original experiment path was run locally, then the results are uploaded to cloud storage, Ray Tune expects the full contents to be available in cloud storage if attempting to resume via Tuner.restore("s3://..."). The restored run will continue writing results to the same cloud storage location.

Parameters:

path – The local or remote path of the experiment directory for an interrupted or failed run. Note that an experiment where all trials finished will not be resumed. This information could be easily located near the end of the console output of previous run.
trainable – The trainable to use upon resuming the experiment. This should be the same trainable that was used to initialize the original Tuner.
param_space – The same param_space that was passed to the original Tuner. This can be optionally re-specified due to the param_space potentially containing Ray object references (tuning over Datasets or tuning over several ray.put object references). Tune expects the `param_space` to be unmodified, and the only part that will be used during restore are the updated object references. Changing the hyperparameter search space then resuming is NOT supported by this API.
resume_unfinished – If True, will continue to run unfinished trials.
resume_errored – If True, will re-schedule errored trials and try to restore from their latest checkpoints.
restart_errored – If True, will re-schedule errored trials but force restarting them from scratch (no checkpoint will be loaded).
storage_filesystem – Custom pyarrow.fs.FileSystemcorresponding to the path. This may be necessary if the original experiment passed in a custom filesystem.
_resume_config – [Experimental] Config object that controls how to resume trials of different statuses. Can be used as a substitute toresume_* and restart_* flags above.