[speculator training] Speculator training by daviswer · Pull Request #35 · foundation-model-stack/fms-fsdp (original) (raw)

Add support for speculator training, piggybacking off the existing training utilities.

Training script and speculator-specific utilities are inside the new speculator subfolder.

Uses distributed setup, checkpointing, and dataloaders from this repo. Adds speculator-specific fields to the training config file (to be ignored during non-speculator training). It might make more sense to pull these new fields out into a separate config subclass under speculator utilities - open to suggestions.

Uses speculator architecture from fms-extras.

Uses altered Llama-7b and generate() function from base fms, allowing the speculator to access embedding vectors, not just logits/token predictions. ~~Do not merge this until that issue can be resolved.~~