Server Arguments — SGLang (original) (raw)

Server Arguments#

This page provides a list of server arguments used in the command line to configure the behavior and performance of the language model server during deployment. These arguments enable users to customize key aspects of the server, including model selection, parallelism policies, memory management, and optimization techniques. You can find all arguments by python3 -m sglang.launch_server --help

Common launch commands#

Node 0

python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --dist-init-addr sgl-dev-0:50000 --nnodes 2 --node-rank 0

Node 1

python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --dist-init-addr sgl-dev-0:50000 --nnodes 2 --node-rank 1

Please consult the documentation below and server_args.py to learn more about the arguments you may provide when launching a server.

Model, processor and tokenizer#

Memory and scheduling#

Other runtime options#

Logging#

Data parallelism#

Multi-node distributed serving#

Model override args#

LoRA#

Kernel backend#

Speculative decoding#

Expert parallelism#

Optimization/debug options#

Prefill decode disaggregation#