Attention Layers (original) (raw)

Training with attention

By default DALLE will use full attention for all layers, but you can specify the attention type per layer as follows.

dalle = DALLE( # ... attn_types = ('full', 'axial_row', 'axial_col', 'conv_like') # cycles between these four types of attention )

Each different type is an attempt at replicating the scant details regarding the matter from OpenAI.

When in doubt - and if you don't need the VRAM/runtime savings, train with:

If you can meet these requirements - this is worth the install.

dalle = DALLE( # ... attn_types = ('full', 'sparse') # cycles between full and sparse attention)