GitHub - Kwai-Keye/Keye (original) (raw)

Kwai Keye-VL Logo

🔥 News

Keye-VL banner

Contents

Meet Keye-VL-2.0-30B-A3B — the latest 30B-class flagship base model in the Keye series, purpose-built to push the frontier of long-video understanding and to unlock the first generation of Agent capabilities in the Keye family.

Highlights

Video Benchmark Comparison

As the first multi-modal model to land DSA in production, Keye-VL-2.0-30B-A3B delivers nearly lossless reasoning over 256K ultra-long context. It tops video understanding benchmarks at its scale and consistently rivals — or surpasses — top-tier closed-source models on fine-grained temporal perception. More importantly, it is the first Keye base model to ship with a built-in Agent collaboration mechanism, demonstrating solid system-level orchestration in Search, Tool, and Code scenarios.

Model Performance on Benchmarks

We compare Keye-VL-2.0-30B-A3B against leading open- and closed-source models (Qwen3.5-35B-A3B, InternVL3.5-241B-A28B, GPT-5-mini, Qwen3-VL 30B-A3B / 32B / 235B-A22B) across seven capability dimensions: Video, Coding, Agent, Math & Reasoning, STEM, Instruction Following, and General VQA.

Performance Comparison

Selected highlights (see the technical report for the full table):

At 30B scale, Keye-VL-2.0-30B-A3B not only outperforms open-source models with 200B+ parameters (e.g., Qwen3-VL-235B) on temporal understanding, but also goes head-to-head with — and in places exceeds — top closed-source giants.

Quickstart

Environment Setup

Option 1 — Recommended: prebuilt Docker image

docker run -it --gpus all kwaikeye/kwai-keye-vl:keye_vl_v2_30b_a3b

Option 2 — Install from source

SGLang (custom branch)

git clone -b keye-vl-v2-30b-release https://github.com/Kwai-Keye/sglang.git cd sglang pip install -e python[all] cd ..

DeepGEMM (Keye support branch)

git clone -b keye_support https://github.com/Kwai-Keye/DeepGEMM.git cd DeepGEMM bash install.sh cd ..

EffectiveKernels

git clone https://github.com/Kwai-Keye/EffectiveKernels.git cd EffectiveKernels pip install -e . --no-deps --no-build-isolation cd ..

Minimal Launch (H800)

python3 -m sglang.launch_server
--model-path=MODEL_NAME
--tp-size=2
--trust-remote-code
--mem-fraction-static=0.8

This is a standard SGLang service — call it with any standard OpenAI-compatible client.

Client Usage

Below are example SGLang inference scripts for both image and video inputs.

All sampling parameters, such as temperature, top_k, and others, are provided for demonstration purposes only and should not be treated as recommended settings. Users are encouraged to experiment with and adjust these parameters based on their own needs.

For video frame-sampling related parameters, users may also customize them as needed. Specifically, min_pixels and max_pixels can be used to set the lower and upper token limits for each frame, while video_total_pixels can be used to limit the total token budget of the entire video input.

If fps is not specified, the default value is 2.0.

Image Input

import json import requests

BASE_URL = "http://MASTER_NODE_IP:8000"

def generate(messages): payload = { "model": "", "messages": messages, "n": 1, "temperature": 0.0, "max_tokens": 256, "top_k": 1, "ignore_eos": False, "skip_special_tokens": True, } resp = requests.post( f"{BASE_URL}/v1/chat/completions", headers={"Content-Type": "application/json"}, data=json.dumps(payload), timeout=1800, ) resp.raise_for_status() return resp.json()

Example: image + text

messages = [ { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": "https://raw.githubusercontent.com/sgl-project/sglang/main/assets/logo.png"}, }, {"type": "text", "text": "Describe this image in detail."}, ], } ]

result = generate(messages) print(result["choices"][0]["message"]["content"])

Video Input

import json import requests

BASE_URL = "http://MASTER_NODE_IP:8000"

def generate(messages): payload = { "model": "", "messages": messages, "n": 1, "temperature": 0.1, "max_tokens": 32760, "top_p": 0.001, "ignore_eos": False, "skip_special_tokens": True, } resp = requests.post( f"{BASE_URL}/v1/chat/completions", headers={"Content-Type": "application/json"}, data=json.dumps(payload), timeout=1800, ) resp.raise_for_status() return resp.json()

Example: Video + text

messages = [ { "role": "user", "content": [ { "type": "video_url", "video_url": { "url": video_url, "preprocess_kwargs": { "fps": 2.0, "min_pixels": 1282828, "max_pixels": 5122828, "video_total_pixels":180102428*28, }
},
}, {"type": "text", "text": "Describe this video."}, ], }, ]

result = generate(messages) print(result["choices"][0]["message"]["content"])

Acknowledgement

Kwai Keye-VL is developed based on the codebases of the following projects: SigLIP, Qwen3, Qwen2.5-VL, VLMEvalKit. We sincerely thank these projects for their outstanding work.