docTR documentation (original) (raw)
Toggle table of contents sidebar
docTR: Document Text Recognition¶
State-of-the-art Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 & PyTorch
DocTR provides an easy and powerful way to extract valuable information from your documents:
- 🧾 for automation: seamlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents.
- 👩🔬 for research: quickly compare your own architectures speed & performances with state-of-art models on public datasets.
Main Features¶
- 🤖 Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters
- ⚡ User-friendly, 3 lines of code to load a document and extract text with a predictor
- 🚀 State-of-the-art performance on public document datasets, comparable with GoogleVision/AWS Textract
- ⚡ Optimized for inference speed on both CPU & GPU
- 🐦 Light package, minimal dependencies
- 🛠️ Actively maintained by Mindee
- 🏭 Easy integration (available templates for browser demo & API deployment)
Model zoo¶
Text detection models¶
- DBNet from “Real-time Scene Text Detection with Differentiable Binarization”
- LinkNet from “LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation”
- FAST from “FAST: Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation”
Text recognition models¶
- SAR from “Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition”
- CRNN from “An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition”
- MASTER from “MASTER: Multi-Aspect Non-local Network for Scene Text Recognition”
- ViTSTR from “Vision Transformer for Fast and Efficient Scene Text Recognition”
- PARSeq from “Scene Text Recognition with Permuted Autoregressive Sequence Models”
Supported datasets¶
- FUNSD from “FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents”.
- CORD from “CORD: A Consolidated Receipt Dataset forPost-OCR Parsing”.
- SROIE from ICDAR 2019.
- IIIT-5k from CVIT.
- Street View Text from “End-to-End Scene Text Recognition”.
- SynthText from Visual Geometry Group.
- SVHN from “Reading Digits in Natural Images with Unsupervised Feature Learning”.
- IC03 from ICDAR 2003.
- IC13 from ICDAR 2013.
- IMGUR5K from “TextStyleBrush: Transfer of Text Aesthetics from a Single Example”.
- MJSynth from “Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition”.
- IIITHWS from “Generating Synthetic Data for Text Recognition”.
- WILDRECEIPT from “Spatial Dual-Modality Graph Reasoning for Key Information Extraction”.