PaddleOCR-VL Usage Guide (original) (raw)

Introduction

PaddleOCR-VL is a SOTA and resource-efficient model tailored for document parsing. Its core component is PaddleOCR-VL-0.9B, a compact yet powerful vision-language model (VLM) that integrates a NaViT-style dynamic resolution visual encoder with the ERNIE-4.5-0.3B language model to enable accurate element recognition.

Installing vLLM

[](#%5F%5Fcodelineno-0-1)uv venv [](#%5F%5Fcodelineno-0-2)source .venv/bin/activate [](#%5F%5Fcodelineno-0-3)# Until v0.11.1 release, you need to install vLLM from nightly build [](#%5F%5Fcodelineno-0-4)uv pip install -U vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --extra-index-url https://download.pytorch.org/whl/cu129 --index-strategy unsafe-best-match

Deploying PaddleOCR-VL

[](#%5F%5Fcodelineno-1-1)vllm serve PaddlePaddle/PaddleOCR-VL \ [](#%5F%5Fcodelineno-1-2) --trust-remote-code \ [](#%5F%5Fcodelineno-1-3) --max-num-batched-tokens 16384 \ [](#%5F%5Fcodelineno-1-4) --no-enable-prefix-caching \ [](#%5F%5Fcodelineno-1-5) --mm-processor-cache-gb 0

Querying with OpenAI API Client

[](#%5F%5Fcodelineno-2-1)from openai import OpenAI [](#%5F%5Fcodelineno-2-2) [](#%5F%5Fcodelineno-2-3)client = OpenAI( [](#%5F%5Fcodelineno-2-4) api_key="EMPTY", [](#%5F%5Fcodelineno-2-5) base_url="http://localhost:8000/v1", [](#%5F%5Fcodelineno-2-6) timeout=3600 [](#%5F%5Fcodelineno-2-7)) [](#%5F%5Fcodelineno-2-8) [](#%5F%5Fcodelineno-2-9)# Task-specific base prompts [](#%5F%5Fcodelineno-2-10)TASKS = { [](#%5F%5Fcodelineno-2-11) "ocr": "OCR:", [](#%5F%5Fcodelineno-2-12) "table": "Table Recognition:", [](#%5F%5Fcodelineno-2-13) "formula": "Formula Recognition:", [](#%5F%5Fcodelineno-2-14) "chart": "Chart Recognition:", [](#%5F%5Fcodelineno-2-15)} [](#%5F%5Fcodelineno-2-16) [](#%5F%5Fcodelineno-2-17)messages = [ [](#%5F%5Fcodelineno-2-18) { [](#%5F%5Fcodelineno-2-19) "role": "user", [](#%5F%5Fcodelineno-2-20) "content": [ [](#%5F%5Fcodelineno-2-21) { [](#%5F%5Fcodelineno-2-22) "type": "image_url", [](#%5F%5Fcodelineno-2-23) "image_url": { [](#%5F%5Fcodelineno-2-24) "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png" [](#%5F%5Fcodelineno-2-25) } [](#%5F%5Fcodelineno-2-26) }, [](#%5F%5Fcodelineno-2-27) { [](#%5F%5Fcodelineno-2-28) "type": "text", [](#%5F%5Fcodelineno-2-29) "text": TASKS["ocr"] [](#%5F%5Fcodelineno-2-30) } [](#%5F%5Fcodelineno-2-31) ] [](#%5F%5Fcodelineno-2-32) } [](#%5F%5Fcodelineno-2-33)] [](#%5F%5Fcodelineno-2-34) [](#%5F%5Fcodelineno-2-35)response = client.chat.completions.create( [](#%5F%5Fcodelineno-2-36) model="PaddlePaddle/PaddleOCR-VL", [](#%5F%5Fcodelineno-2-37) messages=messages, [](#%5F%5Fcodelineno-2-38) temperature=0.0, [](#%5F%5Fcodelineno-2-39)) [](#%5F%5Fcodelineno-2-40)print(f"Generated text: {response.choices[0].message.content}")

Offline inference using vLLM combined with PP-DocLayoutV2

In the examples above, we have demonstrated the inference of PaddleOCR-VL using vLLM. Typically, we also need to integrate the PP-DocLayoutV2 model to fully unleash the capabilities of the PaddleOCR-VL model, making it more aligned with the examples provided by PaddlePaddle officially.

Tip

Use separate virtual environments for vllm and paddlepaddle to prevent dependency conflicts. If you encounter the error The model PaddleOCR-VL-0.9B does not exist., add --served-model-name PaddleOCR-VL-0.9B to your vLLM launch command.

Install PaddlePaddle and PaddleOCR

[](#%5F%5Fcodelineno-3-1)uv pip install paddlepaddle-gpu==3.2.1 --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/ [](#%5F%5Fcodelineno-3-2)uv pip install -U "paddleocr[doc-parser]" [](#%5F%5Fcodelineno-3-3)uv pip install safetensors

Using vLLM as the backend, combined with PP-DocLayoutV2 for offline inference.

[](#%5F%5Fcodelineno-4-1)from paddleocr import PaddleOCRVL [](#%5F%5Fcodelineno-4-2) [](#%5F%5Fcodelineno-4-3)doclayout_model_path = "/path/to/your/PP-DocLayoutV2/" [](#%5F%5Fcodelineno-4-4) [](#%5F%5Fcodelineno-4-5)pipeline = PaddleOCRVL(vl_rec_backend="vllm-server", [](#%5F%5Fcodelineno-4-6) vl_rec_server_url="http://localhost:8000/v1", [](#%5F%5Fcodelineno-4-7) layout_detection_model_name="PP-DocLayoutV2", [](#%5F%5Fcodelineno-4-8) layout_detection_model_dir=doclayout_model_path) [](#%5F%5Fcodelineno-4-9) [](#%5F%5Fcodelineno-4-10)output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png") [](#%5F%5Fcodelineno-4-11) [](#%5F%5Fcodelineno-4-12)for i, res in enumerate(output): [](#%5F%5Fcodelineno-4-13) res.save_to_json(save_path=f"output_{i}.json") [](#%5F%5Fcodelineno-4-14) res.save_to_markdown(save_path=f"output_{i}.md")

Configuration Tips