Logics-MLLM/Logics-Parsing-v2 · Hugging Face (original) (raw)

LogicsDocBench results

OmniDocBench-v1.5 results

Updates

[2026/02/13] 🚀🚀🚀🚀🚀 We release Logics-Parsing-v2 Model.
[2025/09/25] 🚀🚀🚀We release Logics-Parsing Model.

Introduction

Logics-Parsing-v2 is an advanced evolution of the previously proposed Logics-Parsing (v1). It inherits all the core capabilities of v1 model, while demonstrating more powerful capabilities on handling complex documents. Furthermore, it extends support for Parsing-2.0 scenarios, enabling structured parsing of musical sheets, flowcharts, as well as code/pseudocode blocks.

LogicsDocBench 概览

Key Features

Effortless End-to-End Processing
- End-to-end recognition and parsing for various kinds of document elements within a single model.
- Handles complex-layout and text-dense documents such as newspapers and magazines with exceptional precision and ease;
Advanced Content Recognition
- Smaller in size, greater in performance, delivering more accurate and structured parsing of tables and scientific formulas.
- Introducing Parsing-2.0: natively supports parsing of diverse structured content, including flowcharts, music sheets and pseudocode blocks.
Rich, Structured HTML Output
- Transforms documents into concise HTML -- capturing not just content, but also element types, spatial layouts, and semantic hierarchy.
- More scientific and intuitive formats for structured elements -- such as Mermaid for flowcharts and ABC notation for musical scores.
State-of-the-Art Performance
- SOTA across the board: Logics-Parsing-v2 sets top records on both our in-house benchmark (overall score: 82.16) and the renowned public benchmark OmniDocBench-v1.5 (overall score: 93.23).

Benchmark

Comparisons on LogicsDocBench

We introduce LogicsDocBench, a new comprehensive evaluation benchmark comprising 900 carefully selected PDF pages, covering both traditional document Parsing-1.0 tasks and the newly introduced Parsing-2.0 scenarios. This benchmark is designed to better assess models’ capabilities in complex and diverse real-world documents parsing. The dataset is organized into three core document subsets:

STEM Documents (218 pages):
Focuses on high-difficulty academic and educational content, spanning over ten domains including physics, mathematics, engineering, and interdisciplinary sciences. This subset evaluates deep understanding of mathematical formulas, technical terminology, and structured knowledge representation.
Complex Layouts (459 pages):
Includes challenging real-world layouts such as multi-column text, cross-page tables, vertical writing, and mixed text-image arrangements. This subset comprehensively evaluate a model’s layout analysis abilities.
Parsing-2.0 Content (223 pages):
Targets modern digital and semi-structured content that poses significant challenges for traditional OCR systems, including:
- Chemical Molecular formulas
- Musical sheets
- Code and pseudo-code block
- Flowcharts and mind maps

For Parsing-1.0 tasks, we adopt the same evaluation protocols as OmniDocBench-v1.5 to ensure fairness and consistency across benchmarks. For Parsing-2.0, we report fine-grained results using edit distance for each subcategory, and compute an overall score as follows:

Overall=Parsing1.0Overall×3+(1−ChemistryEdit)×100+(1−CodeEdit)×100+(1−ChartEdit)×100+(1−MusicEdit)×1007\small \text{Overall} = \frac{Parsing1.0^{Overall} \times 3 + (1-{Chemistry}^{Edit})\times 100 + (1-{Code}^{Edit})\times 100 + (1-{Chart}^{Edit})\times 100 + (1-{Music}^{Edit})\times 100}{7}

Comprehensive evaluation of document parsing on LogicsDocBench is listed as follows:

The histogram below provides a more intuitive visualization of the advantages of our Logics-Parsing-v2 model in both Parsing-1.0 and 2.0 scenarios.

Comparisons on OmniDocBench_v1.5

We also provide the experimental results of our newly proposed Logics-Parsing-v2 model on the widely recognized open-source benchmark OmniDocBench-v1.5. As shown in the table below, Logics-Parsing-v2 achieves the highest scores among all other approaches, demonstrating its effectiveness and superiority.

* The model results in the table are sourced from the official OmniDocBench website.

Quick Start

1. Installation

conda create -n logis-parsing-v2 python=3.10
conda activate logis-parsing-v2

pip install -r requirements.txt

2. Download Model Weights

# Download our model from Modelscope.
pip install modelscope
python download_model_v2.py -t modelscope

# Download our model from huggingface.
pip install huggingface_hub
python download_model_v2.py -t huggingface

3. Inference

python3 inference_v2.py --image_path PATH_TO_INPUT_IMG --output_path PATH_TO_OUTPUT --model_path PATH_TO_MODEL

Showcases

Acknowledgments

We would like to acknowledge the following open-source projects that provided inspiration and reference for this work: