GitHub - PaddlePaddle/PaddleOCR: Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages. (original) (raw)

PaddleOCR converts PDF documents and images into structured, LLM-ready data (JSON/Markdown) with industry-leading accuracy. With 70k+ Stars and trusted by top-tier projects like Dify, RAGFlow, and Cherry Studio, PaddleOCR is the bedrock for building intelligent RAG and Agentic applications.

πŸš€ Key Features

πŸ“„ Intelligent Document Parsing (LLM-Ready)

Transforming messy visuals into structured data for the LLM era.

πŸ” Universal Text Recognition (Scene OCR)

The global gold standard for high-speed, multilingual text spotting.

PaddleOCR Architecture

πŸ› οΈ Developer-Centric Ecosystem

πŸ“£ Recent updates

πŸ”₯ 2026.06.11: Release of PaddleOCR 3.7.0

History Log

πŸš€ Quick Start

Step 1: Try Online

PaddleOCR official website provides interactive Experience Center and APIsβ€”no setup required, just one click to experience.

πŸ‘‰ Visit Official Website

Step 2: Local Deployment

For local usage, please refer to the following documentation based on your needs:

🧩 More Features

πŸ”„ Quick Overview of Execution Results

PP-OCRv5

PP-OCRv5 Demo

PP-StructureV3

PP-StructureV3 Demo

PaddleOCR-VL

PP-StructureV3 Demo

✨ Stay Tuned

⭐ Star this repository to keep up with exciting updates and new releases, including powerful OCR and document parsing capabilities! ⭐

Star-Project

πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦ Community

πŸ˜ƒ Awesome Projects Leveraging PaddleOCR

PaddleOCR wouldn't be where it is today without its incredible community! πŸ’— A massive thank you to all our longtime partners, new collaborators, and everyone who's poured their passion into PaddleOCR β€” whether we've named you or not. Your support fuels our fire!

πŸ‘©β€πŸ‘©β€πŸ‘§β€πŸ‘¦ Contributors

🌟 Star

Star-history

πŸ“„ License

This project is released under the Apache 2.0 license.

πŸŽ“ Citation

@misc{cui2025paddleocr30technicalreport, title={PaddleOCR 3.0 Technical Report}, author={Cheng Cui and Ting Sun and Manhui Lin and Tingquan Gao and Yubo Zhang and Jiaxuan Liu and Xueqing Wang and Zelun Zhang and Changda Zhou and Hongen Liu and Yue Zhang and Wenyu Lv and Kui Huang and Yichao Zhang and Jing Zhang and Jun Zhang and Yi Liu and Dianhai Yu and Yanjun Ma}, year={2025}, eprint={2507.05595}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2507.05595}, }

@misc{cui2025paddleocrvlboostingmultilingualdocument, title={PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model}, author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Handong Zheng and Jing Zhang and Jun Zhang and Yi Liu and Dianhai Yu and Yanjun Ma}, year={2025}, eprint={2510.14528}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2510.14528}, }

@misc{cui2026paddleocrvl15multitask09bvlm, title={PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing}, author={Cheng Cui and Ting Sun and Suyin Liang and Tingquan Gao and Zelun Zhang and Jiaxuan Liu and Xueqing Wang and Changda Zhou and Hongen Liu and Manhui Lin and Yue Zhang and Yubo Zhang and Yi Liu and Dianhai Yu and Yanjun Ma}, year={2026}, eprint={2601.21957}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2601.21957}, }

@misc{zhang2026paddleocrvl16expandingfrontierdocument, title={PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training}, author={Zelun Zhang and Hongen Liu and Suyin Liang and Yubo Zhang and Yiqing Xiang and Jiaxuan Liu and Ting Sun and Manhui Lin and Yue Zhang and Changda Zhou and Tingquan Gao and Cheng Cui and Yi Liu and Dianhai Yu and Yanjun Ma}, year={2026}, eprint={2606.03264}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2606.03264}, }