sparkrun tune (original) (raw)
sparkrun tune sglang <recipe> [options]
sparkrun tune vllm <recipe> [options]
Run Triton kernel autotuning inside the recipe’s container on a single host. This generates optimal tile configurations (BLOCK_M/N/K, warps, stages) for fused MoE kernels at each tensor parallel size. The resulting configs are saved locally and auto-mounted in future sparkrun run invocations — no manual configuration needed.
Tuning is particularly beneficial for Mixture-of-Experts (MoE) models, where the fused MoE kernel is a performance-critical path. Default Triton configs are generic; tuning finds the best configuration for your specific hardware and model combination.
- Launches a tuning container on the target host using the recipe’s container image
- Clones benchmark scripts (from SGLang or vLLM source) inside the container
- Detects Triton version for compatibility
- Runs autotuning for each requested TP size (sequentially or in parallel)
- Saves configs to
~/.cache/sparkrun/tuning/{sglang,vllm}/on the host - Cleans up the tuning container
On subsequent sparkrun run invocations, tuning configs are automatically detected and mounted into the container with the appropriate environment variable (SGLANG_MOE_CONFIG_DIR for SGLang, VLLM_TUNED_CONFIG_FOLDER for vLLM).
| Option | Description |
|---|---|
| --hosts / -H | Comma-separated host list (only the first host is used) |
| --hosts-file | File with hosts (one per line) |
| --cluster | Use a saved cluster by name |
| --tp | TP size(s) to tune (repeatable; default: 1,2,4,8) |
| --image | Override container image |
| --output-dir | Override tuning config output directory |
| --skip-clone | Skip cloning benchmark scripts (if already in image) |
| --parallel / -j | Run N tuning jobs concurrently (default: 1 = sequential) |
| --dry-run / -n | Show what would be done without executing |
# Tune for all default TP sizes (1, 2, 4, 8)
sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1
# Tune for a specific TP size
sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --tp 2
# Tune for multiple specific TP sizes
sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --tp 1 --tp 2 --tp 4
# Tune with 4 parallel jobs
sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 -j4
# Preview without running
sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --dry-run
Requires an SGLang recipe (the recipe’s runtime must be sglang). Configs are saved to ~/.cache/sparkrun/tuning/sglang/ and auto-mounted via SGLANG_MOE_CONFIG_DIR.
# Tune for all default TP sizes (1, 2, 4, 8)
sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1
# Tune for a specific TP size
sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --tp 4
# Tune for multiple specific TP sizes
sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --tp 1 --tp 2 --tp 4
# Tune with 4 parallel jobs
sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 -j4
# Preview without running
sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --dry-run
Accepts any vLLM recipe variant (vllm-distributed or vllm-ray). Configs are saved to ~/.cache/sparkrun/tuning/vllm/ and auto-mounted via VLLM_TUNED_CONFIG_FOLDER.
Configs are stored on the host at:
| Runtime | Host path | Container path | Env variable |
|---|---|---|---|
| SGLang | ~/.cache/sparkrun/tuning/sglang/ | /root/sglang_tuning_configs | SGLANG_MOE_CONFIG_DIR |
| vLLM | ~/.cache/sparkrun/tuning/vllm/ | /root/vllm_tuning_configs | VLLM_TUNED_CONFIG_FOLDER |
Once tuning configs exist on a host, every sparkrun run for the matching runtime will mount them automatically. No recipe changes or extra flags are needed.
- Tuning runs on a single host even if you specify a cluster — only the first host is used.
- Each TP size produces its own config. Tune for the TP sizes you actually plan to use.
- Use
-j4to run multiple TP sizes in parallel and reduce total tuning time. - Use
--dry-runto preview the Docker and tuning commands before committing to a long run. - Tuning can take hours per TP size depending on the model.