sparkrun tune (original) (raw)


sparkrun tune sglang <recipe> [options]

sparkrun tune vllm <recipe> [options]

Run Triton kernel autotuning inside the recipe’s container on a single host. This generates optimal tile configurations (BLOCK_M/N/K, warps, stages) for fused MoE kernels at each tensor parallel size. The resulting configs are saved locally and auto-mounted in future sparkrun run invocations — no manual configuration needed.

Tuning is particularly beneficial for Mixture-of-Experts (MoE) models, where the fused MoE kernel is a performance-critical path. Default Triton configs are generic; tuning finds the best configuration for your specific hardware and model combination.

Launches a tuning container on the target host using the recipe’s container image
Clones benchmark scripts (from SGLang or vLLM source) inside the container
Detects Triton version for compatibility
Runs autotuning for each requested TP size (sequentially or in parallel)
Saves configs to ~/.cache/sparkrun/tuning/{sglang,vllm}/ on the host
Cleans up the tuning container

On subsequent sparkrun run invocations, tuning configs are automatically detected and mounted into the container with the appropriate environment variable (SGLANG_MOE_CONFIG_DIR for SGLang, VLLM_TUNED_CONFIG_FOLDER for vLLM).

Option	Description
--hosts / -H	Comma-separated host list (only the first host is used)
--hosts-file	File with hosts (one per line)
--cluster	Use a saved cluster by name
--tp	TP size(s) to tune (repeatable; default: 1,2,4,8)
--image	Override container image
--output-dir	Override tuning config output directory
--skip-clone	Skip cloning benchmark scripts (if already in image)
--parallel / -j	Run N tuning jobs concurrently (default: 1 = sequential)
--dry-run / -n	Show what would be done without executing


# Tune for all default TP sizes (1, 2, 4, 8)

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1

# Tune for a specific TP size

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --tp 2

# Tune for multiple specific TP sizes

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --tp 1 --tp 2 --tp 4

# Tune with 4 parallel jobs

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 -j4

# Preview without running

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --dry-run

Requires an SGLang recipe (the recipe’s runtime must be sglang). Configs are saved to ~/.cache/sparkrun/tuning/sglang/ and auto-mounted via SGLANG_MOE_CONFIG_DIR.


# Tune for all default TP sizes (1, 2, 4, 8)

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1

# Tune for a specific TP size

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --tp 4

# Tune for multiple specific TP sizes

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --tp 1 --tp 2 --tp 4

# Tune with 4 parallel jobs

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 -j4

# Preview without running

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --dry-run

Accepts any vLLM recipe variant (vllm-distributed or vllm-ray). Configs are saved to ~/.cache/sparkrun/tuning/vllm/ and auto-mounted via VLLM_TUNED_CONFIG_FOLDER.

Configs are stored on the host at:

Runtime	Host path	Container path	Env variable
SGLang	~/.cache/sparkrun/tuning/sglang/	/root/sglang_tuning_configs	SGLANG_MOE_CONFIG_DIR
vLLM	~/.cache/sparkrun/tuning/vllm/	/root/vllm_tuning_configs	VLLM_TUNED_CONFIG_FOLDER

Once tuning configs exist on a host, every sparkrun run for the matching runtime will mount them automatically. No recipe changes or extra flags are needed.

Tuning runs on a single host even if you specify a cluster — only the first host is used.
Each TP size produces its own config. Tune for the TP sizes you actually plan to use.
Use -j4 to run multiple TP sizes in parallel and reduce total tuning time.
Use --dry-run to preview the Docker and tuning commands before committing to a long run.
Tuning can take hours per TP size depending on the model.