DeepSeek-OCR Usage Guide (original) (raw)

Introduction

DeepSeek-OCR is a frontier OCR model exploring optical context compression for LLMs.

Installing vLLM

[](#%5F%5Fcodelineno-0-1)uv venv [](#%5F%5Fcodelineno-0-2)source .venv/bin/activate [](#%5F%5Fcodelineno-0-3)uv pip install -U vllm --torch-backend auto

Running DeepSeek-OCR

Offline OCR tasks

In this guide, we demonstrate how to set up DeepSeek-OCR for offline OCR batch processing tasks.

[](#%5F%5Fcodelineno-1-1)from vllm import LLM, SamplingParams [](#%5F%5Fcodelineno-1-2)from vllm.model_executor.models.deepseek_ocr import NGramPerReqLogitsProcessor [](#%5F%5Fcodelineno-1-3)from PIL import Image [](#%5F%5Fcodelineno-1-4) [](#%5F%5Fcodelineno-1-5)# Create model instance [](#%5F%5Fcodelineno-1-6)llm = LLM( [](#%5F%5Fcodelineno-1-7) model="deepseek-ai/DeepSeek-OCR", [](#%5F%5Fcodelineno-1-8) enable_prefix_caching=False, [](#%5F%5Fcodelineno-1-9) mm_processor_cache_gb=0, [](#%5F%5Fcodelineno-1-10) logits_processors=[NGramPerReqLogitsProcessor] [](#%5F%5Fcodelineno-1-11)) [](#%5F%5Fcodelineno-1-12) [](#%5F%5Fcodelineno-1-13)# Prepare batched input with your image file [](#%5F%5Fcodelineno-1-14)image_1 = Image.open("path/to/your/image_1.png").convert("RGB") [](#%5F%5Fcodelineno-1-15)image_2 = Image.open("path/to/your/image_2.png").convert("RGB") [](#%5F%5Fcodelineno-1-16)prompt = "<image>\nFree OCR." [](#%5F%5Fcodelineno-1-17) [](#%5F%5Fcodelineno-1-18)model_input = [ [](#%5F%5Fcodelineno-1-19) { [](#%5F%5Fcodelineno-1-20) "prompt": prompt, [](#%5F%5Fcodelineno-1-21) "multi_modal_data": {"image": image_1} [](#%5F%5Fcodelineno-1-22) }, [](#%5F%5Fcodelineno-1-23) { [](#%5F%5Fcodelineno-1-24) "prompt": prompt, [](#%5F%5Fcodelineno-1-25) "multi_modal_data": {"image": image_2} [](#%5F%5Fcodelineno-1-26) } [](#%5F%5Fcodelineno-1-27)] [](#%5F%5Fcodelineno-1-28) [](#%5F%5Fcodelineno-1-29)sampling_param = SamplingParams( [](#%5F%5Fcodelineno-1-30) temperature=0.0, [](#%5F%5Fcodelineno-1-31) max_tokens=8192, [](#%5F%5Fcodelineno-1-32) # ngram logit processor args [](#%5F%5Fcodelineno-1-33) extra_args=dict( [](#%5F%5Fcodelineno-1-34) ngram_size=30, [](#%5F%5Fcodelineno-1-35) window_size=90, [](#%5F%5Fcodelineno-1-36) whitelist_token_ids={128821, 128822}, # whitelist: <td>, </td> [](#%5F%5Fcodelineno-1-37) ), [](#%5F%5Fcodelineno-1-38) skip_special_tokens=False, [](#%5F%5Fcodelineno-1-39) ) [](#%5F%5Fcodelineno-1-40)# Generate output [](#%5F%5Fcodelineno-1-41)model_outputs = llm.generate(model_input, sampling_param) [](#%5F%5Fcodelineno-1-42) [](#%5F%5Fcodelineno-1-43)# Print output [](#%5F%5Fcodelineno-1-44)for output in model_outputs: [](#%5F%5Fcodelineno-1-45) print(output.outputs[0].text)

Online OCR serving

In this guide, we demonstrate how to set up DeepSeek-OCR for online OCR serving with OpenAI compatible API server.

[](#%5F%5Fcodelineno-2-1)vllm serve deepseek-ai/DeepSeek-OCR --logits_processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --no-enable-prefix-caching --mm-processor-cache-gb 0

[](#%5F%5Fcodelineno-3-1)import time [](#%5F%5Fcodelineno-3-2)from openai import OpenAI [](#%5F%5Fcodelineno-3-3) [](#%5F%5Fcodelineno-3-4)client = OpenAI( [](#%5F%5Fcodelineno-3-5) api_key="EMPTY", [](#%5F%5Fcodelineno-3-6) base_url="http://localhost:8000/v1", [](#%5F%5Fcodelineno-3-7) timeout=3600 [](#%5F%5Fcodelineno-3-8)) [](#%5F%5Fcodelineno-3-9) [](#%5F%5Fcodelineno-3-10)messages = [ [](#%5F%5Fcodelineno-3-11) { [](#%5F%5Fcodelineno-3-12) "role": "user", [](#%5F%5Fcodelineno-3-13) "content": [ [](#%5F%5Fcodelineno-3-14) { [](#%5F%5Fcodelineno-3-15) "type": "image_url", [](#%5F%5Fcodelineno-3-16) "image_url": { [](#%5F%5Fcodelineno-3-17) "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png" [](#%5F%5Fcodelineno-3-18) } [](#%5F%5Fcodelineno-3-19) }, [](#%5F%5Fcodelineno-3-20) { [](#%5F%5Fcodelineno-3-21) "type": "text", [](#%5F%5Fcodelineno-3-22) "text": "Free OCR." [](#%5F%5Fcodelineno-3-23) } [](#%5F%5Fcodelineno-3-24) ] [](#%5F%5Fcodelineno-3-25) } [](#%5F%5Fcodelineno-3-26)] [](#%5F%5Fcodelineno-3-27) [](#%5F%5Fcodelineno-3-28)start = time.time() [](#%5F%5Fcodelineno-3-29)response = client.chat.completions.create( [](#%5F%5Fcodelineno-3-30) model="deepseek-ai/DeepSeek-OCR", [](#%5F%5Fcodelineno-3-31) messages=messages, [](#%5F%5Fcodelineno-3-32) max_tokens=2048, [](#%5F%5Fcodelineno-3-33) temperature=0.0, [](#%5F%5Fcodelineno-3-34) extra_body={ [](#%5F%5Fcodelineno-3-35) "skip_special_tokens": False, [](#%5F%5Fcodelineno-3-36) # args used to control custom logits processor [](#%5F%5Fcodelineno-3-37) "vllm_xargs": { [](#%5F%5Fcodelineno-3-38) "ngram_size": 30, [](#%5F%5Fcodelineno-3-39) "window_size": 90, [](#%5F%5Fcodelineno-3-40) # whitelist: <td>, </td> [](#%5F%5Fcodelineno-3-41) "whitelist_token_ids": [128821, 128822], [](#%5F%5Fcodelineno-3-42) }, [](#%5F%5Fcodelineno-3-43) }, [](#%5F%5Fcodelineno-3-44)) [](#%5F%5Fcodelineno-3-45)print(f"Response costs: {time.time() - start:.2f}s") [](#%5F%5Fcodelineno-3-46)print(f"Generated text: {response.choices[0].message.content}")

Configuration Tips