Installation — Lightllm (original) (raw)

Installation#

Lightllm is a Python-based inference framework, with operators implemented in Triton.

Requirements#

Installing with Docker#

The easiest way to install Lightllm is by using the official image. You can directly pull and run the official image:

$ # Pull the official image $ docker pull ghcr.io/modeltc/lightllm:main $ $ # Run the image $ docker run -it --gpus all -p 8080:8080
$ --shm-size 32g -v your_local_path:/data/
$ ghcr.io/modeltc/lightllm:main /bin/bash

You can also manually build and run the image from the source:

$ # Manually build the image $ docker build -t . $ $ # Run the image $ docker run -it --gpus all -p 8080:8080
$ --shm-size 32g -v your_local_path:/data/
$ /bin/bash

Alternatively, you can use a script to automatically build and run the image:

$ # View script parameters $ python tools/quick_launch_docker.py --help

Note

If you are using multiple GPUs, you may need to increase the –shm-size parameter setting above.

Installing from Source#

You can also install Lightllm from source:

$ # (Recommended) Create a new conda environment $ conda create -n lightllm python=3.9 -y $ conda activate lightllm $ $ # Download the latest source code for Lightllm $ git clone https://github.com/ModelTC/lightllm.git $ cd lightllm $ $ # Install Lightllm's dependencies $ pip install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cu124 $ $ # Install Lightllm $ python setup.py install

NOTE: If you are using torch with cuda 11.x instead, run pip install nvidia-nccl-cu12==2.20.5 to support torch cuda graph.

Note

The Lightllm code has been tested on various GPUs, including V100, A100, A800, 4090, and H800. If you are using A100, A800, or similar GPUs, it is recommended to install triton==3.1.0:

$ pip install triton==3.1.0 --no-deps