ComposerScheduler (original) (raw)

Back to top

Edit this page

Toggle table of contents sidebar

class composer.optim.ComposerScheduler[source]#

Specification for a stateless scheduler function.

While this specification is provided as a Python class, an ordinary function can implement this interface as long as it matches the signature of this interface’s __call__() method.

For example, a scheduler that halves the learning rate after 10 epochs could be written as:

def ten_epoch_decay_scheduler(state: State) -> float: if state.timestamp.epoch < 10: return 1.0 return 0.5

ten_epoch_decay_scheduler is a valid ComposerScheduler

trainer = Trainer( schedulers=[ten_epoch_decay_scheduler], ... )

In order to allow schedulers to be configured, schedulers may also written as callable classes:

class VariableEpochDecayScheduler(ComposerScheduler):

def __init__(num_epochs: int):
    self.num_epochs = num_epochs

def __call__(state: State) -> float:
    if state.time.epoch < self.num_epochs:
        return 1.0
    return 0.5

ten_epoch_decay_scheduler = VariableEpochDecayScheduler(num_epochs=10)

ten_epoch_decay_scheduler is also a valid ComposerScheduler

trainer = Trainer( schedulers=[ten_epoch_decay_scheduler], ... )

The constructions of ten_epoch_decay_scheduler in each of the examples above are equivalent. Note that neither scheduler uses the scale_schedule_ratio parameter. As long as this parameter is not used when initializingTrainer, it is not required that any schedulers implement that parameter.

__call__(state, ssr=1.0)[source]#

Calculate the current learning rate multiplier \(\alpha\).

A scheduler function should be a pure function that returns a multiplier to apply to the optimizer’s provided learning rate, given the current trainer state, and optionally a “scale schedule ratio” (SSR). A typical implementation will read state.timestamp, and possibly other fields like state.max_duration, to determine the trainer’s latest temporal progress.

Note

All instances of ComposerScheduler output a multiplier for the learning rate, rather than the learning rate directly. By convention, we use the symbol \(\alpha\) to refer to this multiplier. This means that the learning rate \(\eta\) at time \(t\) can be represented as\(\eta(t) = \eta_i \times \alpha(t)\), where \(\eta_i\) represents the learning rate used to initialize the optimizer.

Note

It is possible to use multiple schedulers, in which case their effects will stack multiplicatively.

The ssr param indicates that the schedule should be “stretched” accordingly. In symbolic terms, where\(\alpha_\sigma(t)\) represents the scheduler output at time \(t\) using scale schedule ratio\(\sigma\):

\[\alpha_{\sigma}(t) = \alpha(t / \sigma) \]

Parameters

Returns

alpha (float) – A multiplier to apply to the optimizer’s provided learning rate.