Hyena — NVIDIA NeMo Framework User Guide (original) (raw)

Introduction to Hyena and Evo 2#

Introduction#

The Hyena architecture represents a significant advancement in neural network design, specifically in the form of convolutional multi-hybrid architectures. As described in the Hyena paper, these architectures provide substantial efficiency gains through co-designed convolution operators and hardware-aware algorithms, enabling faster training and inference compared to traditional Transformers. At the 40 billion parameter scale, Hyena-based models train 1.2 to 2.9 times faster than optimized Transformers, with the StripedHyena 2 architecture achieving two-fold throughput improvement over linear attention and state-space models on H100 GPUs.

Evo 2 is a powerful transformer-hyena hybrid architecture designed for biological sequence modeling. Trained on 9.3 trillion DNA base pairs spanning all domains of life, Evo 2 features an unprecedented 1 million token context window with single-nucleotide resolution. Available in 1B, 7B, and 40B parameter versions, it can accurately predict functional impacts of genetic variation without task-specific fine-tuning, autonomously learning biological features including exon-intron boundaries, transcription factor binding sites, and protein structural elements. The model also enables controllable generation of genomic sequences and epigenomic structure through inference-time search.

Hyena-Based Models#

Available Models#

The Hyena architecture is utilized in various models, with Evo 2 being a prominent example. Evo 2 is available in the following configurations:

Training Recipes#

We provide pre-defined recipes for pre-training and fine-tuning Hyena-based models using NeMo 2.0 and NeMo-Run. These recipes configure a run.Partial for one of the nemo.collections.llm api functions introduced in NeMo 2.0. The recipes are hosted in recipes folder (for example hyena_1b.py).

Pre-Training:

from nemo.collections import llm

For 1B model

pretrain_1b = llm.hyena_1b.pretrain_recipe( name="hyena_1b_pretraining", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallel_size=1, global_batch_size=8, micro_batch_size=1, vocab_file="/path/to/vocab.json", )

For 7B model

pretrain_7b = llm.hyena_7b.pretrain_recipe( name="hyena_7b_pretraining", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )

For 40B model

pretrain_40b = llm.hyena_40b.pretrain_recipe( name="hyena_40b_pretraining", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )

Configure and assign your dataloader

dataloader = a_function_that_configures_your_custom_dataset( gbs=8, # Adjust as needed for your model mbs=1, # Adjust as needed for your model seq_length=pretrain_1b.model.config.seq_length, # Use appropriate model ) pretrain_1b.data = dataloader # Assign to whichever model you're using

Fine-Tuning:

from nemo.collections import llm

For 1B model

finetune_1b = llm.hyena_1b.finetune_recipe( resume_path="/path/to/nemo/checkpoint", name="hyena_1b_finetuning", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallel_size=1, global_batch_size=8, micro_batch_size=1, vocab_file="/path/to/vocab.json", )

For 7B model

finetune_7b = llm.hyena_7b.finetune_recipe( resume_path="/path/to/nemo/checkpoint", name="hyena_7b_finetuning", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )

For 40B model

finetune_40b = llm.hyena_40b.finetune_recipe( resume_path="/path/to/nemo/checkpoint", name="hyena_40b_finetuning", dir="/path/to/checkpoints", num_nodes=1, num_gpus_per_node=8, tensor_parallelism=8, vocab_file="/path/to/vocab.json", )

Configure and assign your dataloader

dataloader = a_function_that_configures_your_custom_dataset( gbs=8, # Adjust as needed for your model mbs=1, # Adjust as needed for your model seq_length=finetune_1b.model.config.seq_length, # Use appropriate model ) finetune_1b.data = dataloader # Assign to whichever model you're using

Note

For pre-training and fine-tuning, the recipes use placeholder datamodules for the data argument. You are expected to replace these with your custom dataset.

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

Running the Training:

Once you have your final configuration ready, you can execute it on any of the NeMo-Run supported executors:

import nemo_run as run

For pre-training - choose the appropriate model

run.run(pretrain_1b, executor=run.LocalExecutor()) # For 1B model

or

run.run(pretrain_7b, executor=run.LocalExecutor()) # For 7B model

or

run.run(pretrain_40b, executor=run.LocalExecutor()) # For 40B model

For fine-tuning - choose the appropriate model

run.run(finetune_1b, executor=run.LocalExecutor()) # For 1B model

or

run.run(finetune_7b, executor=run.LocalExecutor()) # For 7B model

or

run.run(finetune_40b, executor=run.LocalExecutor()) # For 40B model

Alternatively, you can run it directly in the same Python process:

Choose the appropriate model

run.run(pretrain_1b, direct=True) # For 1B pre-training

or

run.run(finetune_7b, direct=True) # For 7B fine-tuning

BioNeMo Integration with Evo 2#

NVIDIA’s BioNeMo Framework provides specialized support for Evo 2 models in genomics and biological applications. BioNeMo adapts the Hyena architecture specifically for biological sequence modeling tasks.

The BioNeMo Evo 2 documentation provides comprehensive details about:

For users interested in applying Evo 2 to their biological data, BioNeMo provides a fine-tuning tutorial that walks through:

The BioNeMo implementation achieves comparable or better accuracy than the original models, with the BioNeMo Evo 2 7B model reaching an AUROC of 0.87 on BRCA1 variant effect prediction tasks.