GitHub - modelscope/ms-swift: Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4v, Phi4, ...) (AAAI 2025). (original) (raw)

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning)

ModelScope Community Website
中文 | English

modelscope%2Fswift | Trendshift

PaperEnglish Documentation中文文档

📖 Table of Contents

☎ Groups

You can contact us and communicate with us by adding our group:

Discord Group WeChat Group

📝 Introduction

🍲 ms-swift is an official framework provided by the ModelScope community for fine-tuning and deploying large language models and multi-modal large models. It currently supports the training (pre-training, fine-tuning, human alignment), inference, evaluation, quantization, and deployment of 500+ large models and 200+ multi-modal large models. These large language models (LLMs) include models such as Qwen3, Qwen3-MoE, Qwen2.5, InternLM3, GLM4, Mistral, DeepSeek-R1, Yi1.5, TeleChat2, Baichuan2, and Gemma2. The multi-modal LLMs include models such as Qwen2.5-VL, Qwen2-Audio, Llama4, Llava, InternVL3, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, and GOT-OCR2.

🍔 Additionally, ms-swift incorporates the latest training technologies, including lightweight techniques such as LoRA, QLoRA, Llama-Pro, LongLoRA, GaLore, Q-GaLore, LoRA+, LISA, DoRA, FourierFt, ReFT, UnSloth, and Liger, as well as human alignment training methods like DPO, GRPO, RM, PPO, KTO, CPO, SimPO, and ORPO. ms-swift supports acceleration of inference, evaluation, and deployment modules using vLLM and LMDeploy, and it supports model quantization with technologies like GPTQ, AWQ, and BNB. Furthermore, ms-swift offers a Gradio-based Web UI and a wealth of best practices.

Why choose ms-swift?

🎉 News

🛠️ Installation

To install using pip:

To install from source:

pip install git+https://github.com/modelscope/ms-swift.git

git clone https://github.com/modelscope/ms-swift.git cd ms-swift pip install -e .

Running Environment:

Range Recommended Notes
python >=3.9 3.10
cuda cuda12 No need to install if using CPU, NPU, MPS
torch >=2.0
transformers >=4.33 4.51.3
modelscope >=1.23
peft >=0.11,<0.16
trl >=0.13,<0.19 0.18 RLHF
deepspeed >=0.14 0.14.5 / 0.16.9 Training
vllm >=0.5.1 0.8.5.post1 Inference/Deployment/Evaluation
lmdeploy >=0.5 0.8 Inference/Deployment/Evaluation
evalscope >=0.11 Evaluation

For more optional dependencies, you can refer to here.

🚀 Quick Start

10 minutes of self-cognition fine-tuning of Qwen2.5-7B-Instruct on a single 3090 GPU:

Command Line Interface

22GB

CUDA_VISIBLE_DEVICES=0
swift sft
--model Qwen/Qwen2.5-7B-Instruct
--train_type lora
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#500'
'AI-ModelScope/alpaca-gpt4-data-en#500'
'swift/self-cognition#500'
--torch_dtype bfloat16
--num_train_epochs 1
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--learning_rate 1e-4
--lora_rank 8
--lora_alpha 32
--target_modules all-linear
--gradient_accumulation_steps 16
--eval_steps 50
--save_steps 50
--save_total_limit 2
--logging_steps 5
--max_length 2048
--output_dir output
--system 'You are a helpful assistant.'
--warmup_ratio 0.05
--dataloader_num_workers 4
--model_author swift
--model_name swift-robot

Tips:

After training is complete, use the following command to infer with the trained weights:

Using an interactive command line for inference.

CUDA_VISIBLE_DEVICES=0
swift infer
--adapters output/vx-xxx/checkpoint-xxx
--stream true
--temperature 0
--max_new_tokens 2048

merge-lora and use vLLM for inference acceleration

CUDA_VISIBLE_DEVICES=0
swift infer
--adapters output/vx-xxx/checkpoint-xxx
--stream true
--merge_lora true
--infer_backend vllm
--max_model_len 8192
--temperature 0
--max_new_tokens 2048

Finally, use the following command to push the model to ModelScope:

CUDA_VISIBLE_DEVICES=0
swift export
--adapters output/vx-xxx/checkpoint-xxx
--push_to_hub true
--hub_model_id ''
--hub_token ''
--use_hf false

Web-UI

The Web-UI is a zero-threshold training and deployment interface solution based on Gradio interface technology. For more details, you can check here.

SWIFT_UI_LANG=en swift web-ui

image.png

Using Python

ms-swift also supports training and inference using Python. Below is pseudocode for training and inference. For more details, you can refer to here.

Training:

Retrieve the model and template, and add a trainable LoRA module

model, tokenizer = get_model_tokenizer(model_id_or_path, ...) template = get_template(model.model_meta.template, tokenizer, ...) model = Swift.prepare_model(model, lora_config)

Download and load the dataset, and encode the text into tokens

train_dataset, val_dataset = load_dataset(dataset_id_or_path, ...) train_dataset = EncodePreprocessor(template=template)(train_dataset, num_proc=num_proc) val_dataset = EncodePreprocessor(template=template)(val_dataset, num_proc=num_proc)

Train the model

trainer = Seq2SeqTrainer( model=model, args=training_args, data_collator=template.data_collator, train_dataset=train_dataset, eval_dataset=val_dataset, template=template, ) trainer.train()

Inference:

Perform inference using the native PyTorch engine

engine = PtEngine(model_id_or_path, adapters=[lora_checkpoint]) infer_request = InferRequest(messages=[{'role': 'user', 'content': 'who are you?'}]) request_config = RequestConfig(max_tokens=max_new_tokens, temperature=temperature)

resp_list = engine.infer([infer_request], request_config) print(f'response: {resp_list[0].choices[0].message.content}')

✨ Usage

Here is a minimal example of training to deployment using ms-swift. For more details, you can check the examples.

Useful Links
🔥Command Line Parameters
Supported Models and Datasets
Custom Models, 🔥Custom Datasets
LLM Tutorial

Training

Supported Training Methods:

Method Full-Parameter LoRA QLoRA Deepspeed Multi-Node Multi-Modal
Pre-training
Instruction Supervised Fine-tuning
DPO Training
GRPO Training
Reward Model Training
PPO Training
KTO Training
CPO Training
SimPO Training
ORPO Training
Classification Model Training
Embedding Model Training

Pre-training:

8*A100

NPROC_PER_NODE=8
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
swift pt
--model Qwen/Qwen2.5-7B
--dataset swift/chinese-c4
--streaming true
--train_type full
--deepspeed zero2
--output_dir output
--max_steps 10000
...

Fine-tuning:

CUDA_VISIBLE_DEVICES=0 swift sft
--model Qwen/Qwen2.5-7B-Instruct
--dataset AI-ModelScope/alpaca-gpt4-data-en
--train_type lora
--output_dir output
...

RLHF:

CUDA_VISIBLE_DEVICES=0 swift rlhf
--rlhf_type dpo
--model Qwen/Qwen2.5-7B-Instruct
--dataset hjh0119/shareAI-Llama3-DPO-zh-en-emoji
--train_type lora
--output_dir output
...

Inference

CUDA_VISIBLE_DEVICES=0 swift infer
--model Qwen/Qwen2.5-7B-Instruct
--stream true
--infer_backend pt
--max_new_tokens 2048

LoRA

CUDA_VISIBLE_DEVICES=0 swift infer
--model Qwen/Qwen2.5-7B-Instruct
--adapters swift/test_lora
--stream true
--infer_backend pt
--temperature 0
--max_new_tokens 2048

Interface Inference

CUDA_VISIBLE_DEVICES=0 swift app
--model Qwen/Qwen2.5-7B-Instruct
--stream true
--infer_backend pt
--max_new_tokens 2048

Deployment

CUDA_VISIBLE_DEVICES=0 swift deploy
--model Qwen/Qwen2.5-7B-Instruct
--infer_backend vllm

Sampling

CUDA_VISIBLE_DEVICES=0 swift sample
--model LLM-Research/Meta-Llama-3.1-8B-Instruct
--sampler_engine pt
--num_return_sequences 5
--dataset AI-ModelScope/alpaca-gpt4-data-zh#5

Evaluation

CUDA_VISIBLE_DEVICES=0 swift eval
--model Qwen/Qwen2.5-7B-Instruct
--infer_backend lmdeploy
--eval_backend OpenCompass
--eval_dataset ARC_c

Quantization

CUDA_VISIBLE_DEVICES=0 swift export
--model Qwen/Qwen2.5-7B-Instruct
--quant_bits 4 --quant_method awq
--dataset AI-ModelScope/alpaca-gpt4-data-zh
--output_dir Qwen2.5-7B-Instruct-AWQ

Push Model

swift export
--model
--push_to_hub true
--hub_model_id ''
--hub_token ''

🏛 License

This framework is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.

📎 Citation

@misc{zhao2024swiftascalablelightweightinfrastructure, title={SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning}, author={Yuze Zhao and Jintao Huang and Jinghan Hu and Xingjun Wang and Yunlin Mao and Daoze Zhang and Zeyinzi Jiang and Zhikai Wu and Baole Ai and Ang Wang and Wenmeng Zhou and Yingda Chen}, year={2024}, eprint={2408.05517}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2408.05517}, }

Star History

Star History Chart