GitHub - alibaba/Logics-Parsing (original) (raw)

πŸ’» HomePage | πŸ€— Model | πŸ€– Demo

LogicsDocBench results

OmniDocBench-v1.5 results

Updates

Introduction

Logics-Parsing-v2 is an advanced evolution of the previously proposed Logics-Parsing (v1). It inherits all the core capabilities of v1 model, while demonstrating more powerful capabilities on handling complex documents. Furthermore, it extends support for Parsing-2.0 scenarios, enabling structured parsing of musical sheets, flowcharts, as well as code/pseudocode blocks.

LogicsDocBench ζ¦‚θ§ˆ

Key Features

v1

v2

Benchmark

v1 Existing document-parsing benchmarks often provide limited coverage of complex layouts and STEM content. To address this, we constructed an in-house benchmark comprising 1,078 page-level images across nine major categories and over twenty sub-categories. Our model achieves the best performance on this benchmark.

Model Type Methods Overall Edit ↓ Text Edit Edit ↓ Formula Edit ↓ Table TEDS ↑ Table Edit ↓ ReadOrderEdit ↓ ChemistryEdit ↓ HandWritingEdit ↓
EN ZH EN ZH EN ZH EN ZH EN ZH EN ZH ALL ALL
Pipeline Tools doc2x 0.209 0.188 0.128 0.194 0.377 0.321 81.1 85.3 0.148 0.115 0.146 0.122 1.0 0.307
Textin 0.153 0.158 0.132 0.190 0.185 0.223 76.7 86.3 0.176 0.113 0.118 0.104 1.0 0.344
mathpix* 0.128 0.146 0.128 0.152 0.06 0.142 86.2 86.6 0.120 0.127 0.204 0.164 0.552 0.263
PP_StructureV3 0.220 0.226 0.172 0.29 0.272 0.276 66 71.5 0.237 0.193 0.201 0.143 1.0 0.382
Mineru2 0.212 0.245 0.134 0.195 0.280 0.407 67.5 71.8 0.228 0.203 0.205 0.177 1.0 0.387
Marker 0.324 0.409 0.188 0.289 0.285 0.383 65.5 50.4 0.593 0.702 0.23 0.262 1.0 0.50
Pix2text 0.447 0.547 0.485 0.577 0.312 0.465 64.7 63.0 0.566 0.613 0.424 0.534 1.0 0.95
Expert VLMs Dolphin 0.208 0.256 0.149 0.189 0.334 0.346 72.9 60.1 0.192 0.35 0.160 0.139 0.984 0.433
dots.ocr 0.186 0.198 0.115 0.169 0.291 0.358 79.5 82.5 0.172 0.141 0.165 0.123 1.0 0.255
MonkeyOcr 0.193 0.259 0.127 0.236 0.262 0.325 78.4 74.7 0.186 0.294 0.197 0.180 1.0 0.623
OCRFlux 0.252 0.254 0.134 0.195 0.326 0.405 58.3 70.2 0.358 0.260 0.191 0.156 1.0 0.284
Gotocr 0.247 0.249 0.181 0.213 0.231 0.318 59.5 74.7 0.38 0.299 0.195 0.164 0.969 0.446
Olmocr 0.341 0.382 0.125 0.205 0.719 0.766 57.1 56.6 0.327 0.389 0.191 0.169 1.0 0.294
SmolDocling 0.657 0.895 0.486 0.932 0.859 0.972 18.5 1.5 0.86 0.98 0.413 0.695 1.0 0.927
Logics-Parsing 0.124 0.145 0.089 0.139 0.106 0.165 76.6 79.5 0.165 0.166 0.136 0.113 0.519 0.252
General VLMs Qwen2VL-72B 0.298 0.342 0.142 0.244 0.431 0.363 64.2 55.5 0.425 0.581 0.193 0.182 0.792 0.359
Qwen2.5VL-72B 0.233 0.263 0.162 0.24 0.251 0.257 69.6 67 0.313 0.353 0.205 0.204 0.597 0.349
Doubao-1.6 0.188 0.248 0.129 0.219 0.273 0.336 74.9 69.7 0.180 0.288 0.171 0.148 0.601 0.317
GPT-5 0.242 0.373 0.119 0.36 0.398 0.456 67.9 55.8 0.26 0.397 0.191 0.28 0.88 0.46
Gemini2.5 pro 0.185 0.20 0.115 0.155 0.288 0.326 82.6 80.3 0.154 0.182 0.181 0.136 0.535 0.26

* Tested on the v3/PDF Conversion API (August 2025 deployment).

Comparisons on LogicsDocBench

We introduce LogicsDocBench, a new comprehensive evaluation benchmark comprising 900 carefully selected PDF pages, covering both traditional document Parsing-1.0 tasks and the newly introduced Parsing-2.0 scenarios. This benchmark is designed to better assess models’ capabilities in complex and diverse real-world documents parsing. The dataset is organized into three core document subsets:

For Parsing-1.0 tasks, we adopt the same evaluation protocols as OmniDocBench-v1.5 to ensure fairness and consistency across benchmarks. For Parsing-2.0, we report fine-grained results using edit distance for each subcategory, and compute an overall score as follows: \small \text{Overall} = \frac{Parsing1.0^{Overall} \times 3 + (1-{Chemistry}^{Edit})\times 100 + (1-{Code}^{Edit})\times 100 + (1-{Chart}^{Edit})\times 100 + (1-{Music}^{Edit})\times 100}{7}$$

Comprehensive evaluation of document parsing on LogicsDocBench is listed as follows:

The histogram below provides a more intuitive visualization of the advantages of our Logics-Parsing-v2 model in both Parsing-1.0 and 2.0 scenarios.

Comparisons on OmniDocBench_v1.5

We also provide the experimental results of our newly proposed Logics-Parsing-v2 model on the widely recognized open-source benchmark OmniDocBench-v1.5. As shown in the table below, Logics-Parsing-v2 achieves highly competitive performance.

* The model results in the table are sourced from the official OmniDocBench website.

Quick Start

v1

1. Installation

conda create -n logis-parsing python=3.10 conda activate logis-parsing pip install -r requirement.txt

2. Download Model Weights

# Download our model from Modelscope.
pip install modelscope
python download_model.py -t modelscope
# Download our model from huggingface.
pip install huggingface_hub
python download_model.py -t huggingface

3. Inference

python3 inference.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL

1. Installation

conda create -n logis-parsing-v2 python=3.10 conda activate logis-parsing-v2

pip install -r requirements.txt

2. Download Model Weights

# Download our model from Modelscope.
pip install modelscope
python download_model_v2.py -t modelscope

# Download our model from huggingface.
pip install huggingface_hub
python download_model_v2.py -t huggingface

3. Inference

python3 inference_v2.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL

Showcases

Acknowledgments

We would like to acknowledge the following open-source projects that provided inspiration and reference for this work: