Server Arguments — SGLang (original) (raw)

Contents

Server Arguments#

This page provides a list of server arguments used in the command line to configure the behavior and performance of the language model server during deployment. These arguments enable users to customize key aspects of the server, including model selection, parallelism policies, memory management, and optimization techniques.

Common launch commands#

Node 0

python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --dist-init-addr sgl-dev-0:50000 --nnodes 2 --node-rank 0

Node 1

python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3-8B-Instruct --tp 4 --dist-init-addr sgl-dev-0:50000 --nnodes 2 --node-rank 1

Please consult the documentation below and server_args.py to learn more about the arguments you may provide when launching a server.

Model, processor and tokenizer#

Serving: HTTP & API#

HTTP Server configuration#

API configuration#

Parallelism#

Tensor parallelism#

Data parallelism#

Expert parallelism#

Memory and scheduling#

Other runtime options#

Logging#

Multi-node distributed serving#

LoRA#

Kernel backend#

Constrained Decoding#

Speculative decoding#

Debug options#

Note: We recommend to stay with the defaults and only use these options for debugging for best possible performance.

Optimization#

Note: Some of these options are still in experimental stage.