GitHub - bytedance/Dolphin: The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025. (original) (raw)

Dolphin-v2 is an enhanced universal document parsing model that substantially improves upon the original Dolphin. It seamlessly handles any document type—whether digital-born or photographed—through a document-type-aware two-stage architecture with scalable anchor prompting.

📑 Overview

Document image parsing is challenging due to diverse document types and complexly intertwined elements such as text paragraphs, figures, formulas, tables, and code blocks. Dolphin-v2 addresses these challenges through a document-type-aware two-stage approach:

  1. 🔍 Stage 1: Document type classification (digital vs. photographed) + layout analysis with reading order prediction
  2. 🧩 Stage 2: Hybrid parsing strategy - holistic parsing for photographed documents, parallel element-wise parsing for digital documents

Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.

📅 Changelog

📈 Performance

Comprehensive evaluation of document parsing on OmniDocBench (v1.5)

Model Size Overall↑ TextEdit↓ FormulaCDM↑ TableTEDS↑ TableTEDS-S↑ Read OrderEdit↓
Dolphin 0.3B 74.67 0.125 67.85 68.70 77.77 0.124
Dolphin-1.5 0.3B 85.06 0.085 79.44 84.25 88.06 0.071
Dolphin-v2 3B 89.78 0.054 87.63 87.02 90.48 0.054

🛠️ Installation

  1. Clone the repository:
    git clone https://github.com/ByteDance/Dolphin.git
    cd Dolphin
  2. Install the dependencies:
    pip install -r requirements.txt
  3. Download the pre-trained models of Dolphin-v2:
    Visit our Huggingface model card, or download model by:

Download the model from Hugging Face Hub

git lfs install
git clone https://huggingface.co/ByteDance/Dolphin-v2 ./hf_model

Or use the Hugging Face CLI

pip install huggingface_hub
huggingface-cli download ByteDance/Dolphin-v2 --local-dir ./hf_model

⚡ Inference

Dolphin provides two inference frameworks with support for two parsing granularities:

📄 Page-level Parsing

Process a single document image

python demo_page.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs/page_1.png

Process a single document pdf

python demo_page.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs/page_6.pdf

Process all documents in a directory

python demo_page.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs

Process with custom batch size for parallel element decoding

python demo_page.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs
--max_batch_size 8

🧩 Element-level Parsing

Process element images (specify element_type: table, formula, text, or code)

python demo_element.py --model_path ./hf_model --save_dir ./results
--input_path
--element_type [table|formula|text|code]

🎨 Layout Parsing

Process a single document image

python demo_layout.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs/page_1.png \

Process a single PDF document

python demo_layout.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs/page_6.pdf \

Process all documents in a directory

python demo_layout.py --model_path ./hf_model --save_dir ./results
--input_path ./demo/page_imgs

🌟 Key Features

📮 Notice

Call for Bad Cases: If you have encountered any cases where the model performs poorly, we would greatly appreciate it if you could share them in the issue. We are continuously working to optimize and improve the model.

💖 Acknowledgement

We would like to acknowledge the following open-source projects that provided inspiration and reference for this work:

📝 Citation

If you find this code useful for your research, please use the following BibTeX entry.

@article{feng2025dolphin, title={Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting}, author={Feng, Hao and Wei, Shu and Fei, Xiang and Shi, Wei and Han, Yingdong and Liao, Lei and Lu, Jinghui and Wu, Binghong and Liu, Qi and Lin, Chunhui and others}, journal={arXiv preprint arXiv:2505.14059}, year={2025} }

Star History

Star History Chart