sparkrun tune (original) (raw)


sparkrun tune sglang <recipe> [options]

sparkrun tune vllm <recipe> [options]

Run Triton kernel autotuning inside the recipe’s container on a single host. This generates optimal tile configurations (BLOCK_M/N/K, warps, stages) for fused MoE kernels at each tensor parallel size. The resulting configs are saved locally and auto-mounted in future sparkrun run invocations — no manual configuration needed.

Tuning is particularly beneficial for Mixture-of-Experts (MoE) models, where the fused MoE kernel is a performance-critical path. Default Triton configs are generic; tuning finds the best configuration for your specific hardware and model combination.

  1. Launches a tuning container on the target host using the recipe’s container image
  2. Clones benchmark scripts (from SGLang or vLLM source) inside the container
  3. Detects Triton version for compatibility
  4. Runs autotuning for each requested TP size (sequentially or in parallel)
  5. Saves configs to ~/.cache/sparkrun/tuning/{sglang,vllm}/ on the host
  6. Cleans up the tuning container

On subsequent sparkrun run invocations, tuning configs are automatically detected and mounted into the container with the appropriate environment variable (SGLANG_MOE_CONFIG_DIR for SGLang, VLLM_TUNED_CONFIG_FOLDER for vLLM).

Option Description
--hosts / -H Comma-separated host list (only the first host is used)
--hosts-file File with hosts (one per line)
--cluster Use a saved cluster by name
--tp TP size(s) to tune (repeatable; default: 1,2,4,8)
--image Override container image
--output-dir Override tuning config output directory
--skip-clone Skip cloning benchmark scripts (if already in image)
--parallel / -j Run N tuning jobs concurrently (default: 1 = sequential)
--dry-run / -n Show what would be done without executing

# Tune for all default TP sizes (1, 2, 4, 8)

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1

# Tune for a specific TP size

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --tp 2

# Tune for multiple specific TP sizes

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --tp 1 --tp 2 --tp 4

# Tune with 4 parallel jobs

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 -j4

# Preview without running

sparkrun tune sglang qwen3.5-35b-bf16-sglang -H 127.0.0.1 --dry-run

Requires an SGLang recipe (the recipe’s runtime must be sglang). Configs are saved to ~/.cache/sparkrun/tuning/sglang/ and auto-mounted via SGLANG_MOE_CONFIG_DIR.


# Tune for all default TP sizes (1, 2, 4, 8)

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1

# Tune for a specific TP size

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --tp 4

# Tune for multiple specific TP sizes

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --tp 1 --tp 2 --tp 4

# Tune with 4 parallel jobs

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 -j4

# Preview without running

sparkrun tune vllm qwen3-moe-vllm -H 127.0.0.1 --dry-run

Accepts any vLLM recipe variant (vllm-distributed or vllm-ray). Configs are saved to ~/.cache/sparkrun/tuning/vllm/ and auto-mounted via VLLM_TUNED_CONFIG_FOLDER.

Configs are stored on the host at:

Runtime Host path Container path Env variable
SGLang ~/.cache/sparkrun/tuning/sglang/ /root/sglang_tuning_configs SGLANG_MOE_CONFIG_DIR
vLLM ~/.cache/sparkrun/tuning/vllm/ /root/vllm_tuning_configs VLLM_TUNED_CONFIG_FOLDER

Once tuning configs exist on a host, every sparkrun run for the matching runtime will mount them automatically. No recipe changes or extra flags are needed.