Releases · mindee/doctr (original) (raw)

v1.0.1

v1.0.0

Note: docTR 1.0.0 requires python >= 3.10

What's Changed

Breaking Change

TensorFlow has been removed as a supported backend. docTR now comes with PyTorch as the default and only deep learning backend.

The installation options torch and tf have been removed. You can now install docTR simply with:

This will install docTR with PyTorch support by default.

Training script filenames have been updated to remove backend-specific extensions. For example:

recognition/train_pytorch.py → recognition/train.py

New features

What's Changed

Breaking Changes 🛠

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.12.0...v1.0.0

v0.12.0

Note: docTR 0.12.0 requires python >= 3.10
Note: docTR 0.12.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

Warning

TensorFlow Backend Deprecation Notice

Using docTR with TensorFlow as a backend is deprecated and will be removed in the next major release (v1.0.0).
We recommend switching to the PyTorch backend, which is more actively maintained and supports the latest features and models.
Alternatively, you can use OnnxTR, which does not require TensorFlow or PyTorch.

This decision was made based on several considerations:

Warning

This release is the last minor release supporting TensorFlow as backend

What's changed

New features

NEW

model = vitstr_small(pretrained=False, pretrained_backbone=False) model.from_pretrained("") # local path or url to .pt or .h5

Instead of depending on the backend

reco_params = torch.load('', map_location="cpu") reco_model.load_state_dict(reco_params)

Or with TensorFlow

reco_model.load_weights(..)

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.11.0...v0.12.0

v0.11.0

Note: docTR 0.11.0 requires python >= 3.10
Note: docTR 0.11.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

What's changed

New features

Compile your model

Compiling your PyTorch models with torch.compile optimizes the model by converting it to a graph representation and applying backends that can improve performance.
This process can make inference faster and reduce memory overhead during execution.

Further information can be found in the PyTorch documentation

import torch from doctr.models import ( ocr_predictor, vitstr_small, fast_base, mobilenet_v3_small_crop_orientation, mobilenet_v3_small_page_orientation, crop_orientation_predictor, page_orientation_predictor )

Compile the models

detection_model = torch.compile( fast_base(pretrained=True).eval() ) recognition_model = torch.compile( vitstr_small(pretrained=True).eval() ) crop_orientation_model = torch.compile( mobilenet_v3_small_crop_orientation(pretrained=True).eval() ) page_orientation_model = torch.compile( mobilenet_v3_small_page_orientation(pretrained=True).eval() )

predictor = models.ocr_predictor( detection_model, recognition_model, assume_straight_pages=False )

NOTE: Only required for non-straight pages (assume_straight_pages=False) and non-disabled orientation classification

Set the orientation predictors

predictor.crop_orientation_predictor = crop_orientation_predictor(crop_orientation_model) predictor.page_orientation_predictor = page_orientation_predictor(page_orientation_model)

compiled_out = predictor(doc)

What's Changed

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.10.0...v0.11.0

v0.10.0

Note: docTR 0.10.0 requires python >= 3.9
Note: docTR 0.10.0 requires either TensorFlow >= 2.15.0 or PyTorch >= 2.0.0

What's Changed

Soft Breaking Changes (TensorFlow backend only) 🛠

NOTE: Please update your custom trained models and HuggingFace hub uploaded models, this will be the last release supporting manual loading from /weights.

New features

Disable page orientation classification

from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_page_orientation=True)

Disable crop orientation classification

from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True, assume_straight_pages=False, disable_crop_orientation=True)

Loading custom exported orientation classification models

You can now load your custom trained orientation models, the following snippet demonstrates how:

from doctr.io import DocumentFile from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor

custom_page_orientation_model = mobilenet_v3_small_page_orientation("") custom_crop_orientation_model = mobilenet_v3_small_crop_orientation(""))

predictor = ocr_predictor(pretrained=True, assume_straight_pages=False, detect_orientation=True)

Overwrite the default orientation models

predictor.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model) predictor.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.9.0...v0.10.0

v0.9.0

v0.8.1

v0.8.0

v0.7.0

Note: doctr 0.7.0 requires either TensorFlow >= 2.11.0 or PyTorch >= 1.12.0.
Note: We will release the missing PyTorch checkpoints with 0.7.1

What's Changed

Breaking Changes 🛠

New features

Add of the KIE predictor

The KIE predictor is a more flexible predictor compared to OCR as your detection model can detect multiple classes in a document. For example, you can have a detection model to detect just dates and adresses in a document.

The KIE predictor makes it possible to use detector with multiple classes with a recognition model and to have the whole pipeline already setup for you.

from doctr.io import DocumentFile from doctr.models import kie_predictor

Model

model = kie_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

PDF

doc = DocumentFile.from_pdf("path/to/your/doc.pdf")

Analyze

result = model(doc)

predictions = result.pages[0].predictions for class_name in predictions.keys(): list_predictions = predictions[class_name] for prediction in list_predictions: print(f"Prediction for {class_name}: {prediction}")

The KIE predictor results per page are in a dictionary format with each key representing a class name and it's value are the predictions for that class.

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.0

Highlights of the release:

Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

Full integration with Huggingface Hub (docTR meets Huggingface)

hf

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, from_hub
image = DocumentFile.from_images(['data/example.jpg'])
# Load a custom detection model from huggingface hub
det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
# Load a custom recognition model from huggingface hub
reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
# You can easily plug in this models to the OCR predictor
predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
result = predictor(image)
from doctr.models import recognition, login_to_hub, push_to_hf_hub
login_to_hub()
my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')

Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

Predefined datasets can be used also for recognition task

from doctr.datasets import CORD
# Crop boxes as is (can contain irregular)
train_set = CORD(train=True, download=True, recognition_task=True)
# Crop rotated boxes (always regular)
train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
img, target = train_set[0]

Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

New models (both frameworks)

Bug fixes recognition models

ONNX support (experimential)

NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

Further features

What's Changed

Breaking Changes 🛠

New Features

Bug Fixes

Improvements

Miscellaneous

Read more