Hyena — NVIDIA NeMo Framework User Guide (original) (raw)
Introduction to Hyena and Evo 2#
Introduction#
The Hyena architecture represents a significant advancement in neural network design, specifically in the form of convolutional multi-hybrid architectures. As described in the Hyena paper, these architectures provide substantial efficiency gains through co-designed convolution operators and hardware-aware algorithms, enabling faster training and inference compared to traditional Transformers. At the 40 billion parameter scale, Hyena-based models train 1.2 to 2.9 times faster than optimized Transformers, with the StripedHyena 2 architecture achieving two-fold throughput improvement over linear attention and state-space models on H100 GPUs.
Evo 2 is a powerful transformer-hyena hybrid architecture designed for biological sequence modeling. Trained on 9.3 trillion DNA base pairs spanning all domains of life, Evo 2 features an unprecedented 1 million token context window with single-nucleotide resolution. Available in 1B, 7B, and 40B parameter versions, it can accurately predict functional impacts of genetic variation without task-specific fine-tuning, autonomously learning biological features including exon-intron boundaries, transcription factor binding sites, and protein structural elements. The model also enables controllable generation of genomic sequences and epigenomic structure through inference-time search.
Hyena-Based Models#
Available Models#
The Hyena architecture is utilized in various models, with Evo 2 being a prominent example. Evo 2 is available in the following configurations:
Training Recipes#
We provide pre-defined recipes for pre-training and fine-tuning Hyena-based models using NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial
for one of the nemo.collections.llm
api functions introduced in NeMo 2.0. The recipes are hosted in recipes folder (for example hyena_1b.py
).
Pre-Training:
from nemo.collections import llm
For 1B model
pretrain_1b = llm.hyena_1b.pretrain_recipe( name="hyena_1b_pretraining", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallel_size=1, global_batch_size=8, micro_batch_size=1, vocab_file="/path/to/vocab.json", )
For 7B model
pretrain_7b = llm.hyena_7b.pretrain_recipe( name="hyena_7b_pretraining", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )
For 40B model
pretrain_40b = llm.hyena_40b.pretrain_recipe( name="hyena_40b_pretraining", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )
Configure and assign your dataloader
dataloader = a_function_that_configures_your_custom_dataset( gbs=8, # Adjust as needed for your model mbs=1, # Adjust as needed for your model seq_length=pretrain_1b.model.config.seq_length, # Use appropriate model ) pretrain_1b.data = dataloader # Assign to whichever model you're using
Fine-Tuning:
from nemo.collections import llm
For 1B model
finetune_1b = llm.hyena_1b.finetune_recipe( resume_path="/path/to/nemo/checkpoint", name="hyena_1b_finetuning", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallel_size=1, global_batch_size=8, micro_batch_size=1, vocab_file="/path/to/vocab.json", )
For 7B model
finetune_7b = llm.hyena_7b.finetune_recipe( resume_path="/path/to/nemo/checkpoint", name="hyena_7b_finetuning", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )
For 40B model
finetune_40b = llm.hyena_40b.finetune_recipe( resume_path="/path/to/nemo/checkpoint", name="hyena_40b_finetuning", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )
Configure and assign your dataloader
dataloader = a_function_that_configures_your_custom_dataset( gbs=8, # Adjust as needed for your model mbs=1, # Adjust as needed for your model seq_length=finetune_1b.model.config.seq_length, # Use appropriate model ) finetune_1b.data = dataloader # Assign to whichever model you're using
Note
For pre-training and fine-tuning, the recipes use placeholder datamodules for the data
argument. You are expected to replace these with your custom dataset.
Note
The configuration in the recipes is done using the NeMo-Run run.Config
and run.Partial
configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.
Running the Training:
Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors:
import nemo_run as run
For pre-training - choose the appropriate model
run.run(pretrain_1b, executor=run.LocalExecutor()) # For 1B model
or
run.run(pretrain_7b, executor=run.LocalExecutor()) # For 7B model
or
run.run(pretrain_40b, executor=run.LocalExecutor()) # For 40B model
For fine-tuning - choose the appropriate model
run.run(finetune_1b, executor=run.LocalExecutor()) # For 1B model
or
run.run(finetune_7b, executor=run.LocalExecutor()) # For 7B model
or
run.run(finetune_40b, executor=run.LocalExecutor()) # For 40B model
Alternatively, you can run it directly in the same Python process:
Choose the appropriate model
run.run(pretrain_1b, direct=True) # For 1B pre-training
or
run.run(finetune_7b, direct=True) # For 7B fine-tuning
BioNeMo Integration with Evo 2#
NVIDIA’s BioNeMo Framework provides specialized support for Evo 2 models in genomics and biological applications. BioNeMo adapts the Hyena architecture specifically for biological sequence modeling tasks.
The BioNeMo Evo 2 documentation provides comprehensive details about:
- Model architecture and capabilities
- Available model variants (1B, 7B, and 40B)
- Training diagnostics and benchmarks
- Performance characteristics across different context lengths and cluster sizes
- Zero-shot variant effect prediction for BRCA1 genes
For users interested in applying Evo 2 to their biological data, BioNeMo provides a fine-tuning tutorial that walks through:
- Data preparation for genomic sequences
- Fine-tuning process with biological datasets
- Evaluation of model performance on biological tasks
- Best practices for biological sequence modeling
The BioNeMo implementation achieves comparable or better accuracy than the original models, with the BioNeMo Evo 2 7B model reaching an AUROC of 0.87 on BRCA1 variant effect prediction tasks.