ComposerScheduler (original) (raw)
Toggle table of contents sidebar
class composer.optim.ComposerScheduler[source]#
Specification for a stateless scheduler function.
While this specification is provided as a Python class, an ordinary function can implement this interface as long as it matches the signature of this interface’s __call__() method.
For example, a scheduler that halves the learning rate after 10 epochs could be written as:
def ten_epoch_decay_scheduler(state: State) -> float: if state.timestamp.epoch < 10: return 1.0 return 0.5
ten_epoch_decay_scheduler is a valid ComposerScheduler
trainer = Trainer( schedulers=[ten_epoch_decay_scheduler], ... )
In order to allow schedulers to be configured, schedulers may also written as callable classes:
class VariableEpochDecayScheduler(ComposerScheduler):
def __init__(num_epochs: int):
self.num_epochs = num_epochs
def __call__(state: State) -> float:
if state.time.epoch < self.num_epochs:
return 1.0
return 0.5
ten_epoch_decay_scheduler = VariableEpochDecayScheduler(num_epochs=10)
ten_epoch_decay_scheduler is also a valid ComposerScheduler
trainer = Trainer( schedulers=[ten_epoch_decay_scheduler], ... )
The constructions of ten_epoch_decay_scheduler
in each of the examples above are equivalent. Note that neither scheduler uses the scale_schedule_ratio
parameter. As long as this parameter is not used when initializingTrainer, it is not required that any schedulers implement that parameter.
__call__(state, ssr=1.0)[source]#
Calculate the current learning rate multiplier \(\alpha\).
A scheduler function should be a pure function that returns a multiplier to apply to the optimizer’s provided learning rate, given the current trainer state, and optionally a “scale schedule ratio” (SSR). A typical implementation will read state.timestamp
, and possibly other fields like state.max_duration
, to determine the trainer’s latest temporal progress.
Note
All instances of ComposerScheduler output a multiplier for the learning rate, rather than the learning rate directly. By convention, we use the symbol \(\alpha\) to refer to this multiplier. This means that the learning rate \(\eta\) at time \(t\) can be represented as\(\eta(t) = \eta_i \times \alpha(t)\), where \(\eta_i\) represents the learning rate used to initialize the optimizer.
Note
It is possible to use multiple schedulers, in which case their effects will stack multiplicatively.
The ssr
param indicates that the schedule should be “stretched” accordingly. In symbolic terms, where\(\alpha_\sigma(t)\) represents the scheduler output at time \(t\) using scale schedule ratio\(\sigma\):
\[\alpha_{\sigma}(t) = \alpha(t / \sigma) \]
Parameters
- state (State) – The current Composer Trainer state.
- ssr (float) – The scale schedule ratio. In general, the learning rate computed by this scheduler at time \(t\) with an SSR of 1.0 should be the same as that computed by this scheduler at time \(t \times s\) with an SSR of \(s\). Default =
1.0
.
Returns
alpha (float) – A multiplier to apply to the optimizer’s provided learning rate.