Changelog of v1.x — MMOCR 1.0.1 documentation (original) (raw)

v1.0.0 (04/06/2023)

We are excited to announce the first official release of MMOCR 1.0, with numerous enhancements, bug fixes, and the introduction of new dataset support!

🌟 Highlights

🆕 New Features & Enhancement

📝 Docs

🛠️ Bug Fixes:

🎉 New Contributors

Thank you to all the contributors for making this release possible! We’re excited about the new features and enhancements in this version, and we’re looking forward to your feedback and continued support. Happy coding! 🚀

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc6…v1.0.0

Highlights

v1.0.0rc6 (03/07/2023)

Highlights

  1. Two new models, ABCNet v2 (inference only) and SPTS are added to projects/ folder.
  2. Announcing Inferencer, a unified inference interface in OpenMMLab for everyone’s easy access and quick inference with all the pre-trained weights. Docs
  3. Users can use test-time augmentation for text recognition tasks. Docs
  4. Support batch augmentation through BatchAugSampler, which is a technique used in SPTS.
  5. Dataset Preparer has been refactored to allow more flexible configurations. Besides, users are now able to prepare text recognition datasets in LMDB formats. Docs
  6. Some textspotting datasets have been revised to enhance the correctness and consistency with the common practice.
  7. Potential spurious warnings from shapely have been eliminated.

Dependency

This version requires MMEngine >= 0.6.0, MMCV >= 2.0.0rc4 and MMDet >= 3.0.0rc5.

New Features & Enhancements

Docs

Bug Fixes

New Contributors

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc5…v1.0.0rc6

v1.0.0rc5 (01/06/2023)

Highlights

  1. Two models, Aster and SVTR, are added to our model zoo. The full implementation of ABCNet is also available now.
  2. Dataset Preparer supports 5 more datasets: CocoTextV2, FUNSD, TextOCR, NAF, SROIE.
  3. We have 4 more text recognition transforms, and two helper transforms. See https://github.com/open-mmlab/mmocr/pull/1646 https://github.com/open-mmlab/mmocr/pull/1632 https://github.com/open-mmlab/mmocr/pull/1645 for details.
  4. The transform, FixInvalidPolygon, is getting smarter at dealing with invalid polygons, and now capable of handling more weird annotations. As a result, a complete training cycle on TotalText dataset can be performed bug-free. The weights of DBNet and FCENet pretrained on TotalText are also released.

New Features & Enhancements

Docs

Bug Fixes

New Contributors

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc4…v1.0.0rc5

v1.0.0rc4 (12/06/2022)

Highlights

  1. Dataset Preparer can automatically generate base dataset configs at the end of the preparation process, and supports 6 more datasets: IIIT5k, CUTE80, ICDAR2013, ICDAR2015, SVT, SVTP.
  2. Introducing our projects/ folder - implementing new models and features into OpenMMLab’s algorithm libraries has long been complained to be troublesome due to the rigorous requirements on code quality, which could hinder the fast iteration of SOTA models and might discourage community members from sharing their latest outcome here. We now introduce projects/ folder, where some experimental features, frameworks and models can be placed, only needed to satisfy the minimum requirement on the code quality. Everyone is welcome to post their implementation of any great ideas in this folder! We also add the first example project to illustrate what we expect a good project to have (check out the raw content of README.md for more info!).
  3. Inside the projects/ folder, we are releasing the preview version of ABCNet, which is the first implementation of text spotting models in MMOCR. It’s inference-only now, but the full implementation will be available very soon.

New Features & Enhancements

Docs

Bug Fixes

New Contributors

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc3…v1.0.0rc4

v1.0.0rc3 (11/03/2022)

Highlights

  1. We release several pretrained models using oCLIP-ResNet as the backbone, which is a ResNet variant trained with oCLIP and can significantly boost the performance of text detection models.
  2. Preparing datasets is troublesome and tedious, especially in OCR domain where multiple datasets are usually required. In order to free our users from laborious work, we designed a Dataset Preparer to help you get a bunch of datasets ready for use, with only one line of command! Dataset Preparer is also crafted to consist of a series of reusable modules, each responsible for handling one of the standardized phases throughout the preparation process, shortening the development cycle on supporting new datasets.

New Features & Enhancements

Docs

Bug Fixes

New Contributors

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc2…v1.0.0rc3

v1.0.0rc2 (10/14/2022)

This release relaxes the version requirement of MMEngine to >=0.1.0, < 1.0.0.

v1.0.0rc1 (10/09/2022)

Highlights

This release fixes a severe bug leading to inaccurate metric report in multi-GPU training. We release the weights for all the text recognition models in MMOCR 1.0 architecture. The inference shorthand for them are also added back to ocr.py. Besides, more documentation chapters are available now.

New Features & Enhancements

Docs

Bug Fixes

New Contributors

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc0…v1.0.0rc1

v1.0.0rc0 (09/01/2022)

We are excited to announce the release of MMOCR 1.0.0rc0. MMOCR 1.0.0rc0 is the first version of MMOCR 1.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine, MMOCR 1.x unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.

Highlights

  1. New engines. MMOCR 1.x is based on MMEngine, which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.
  2. Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.
  3. Cross project calling. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection’s Mask R-CNN through MMDetWrapper. Check our documents for more details. More wrappers will be released in the future.
  4. Stronger visualization. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.
  5. More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.

Breaking Changes

We briefly list the major breaking changes here. We will update the migration guide to provide complete details and migration instructions.

Dependencies

Training and testing

Configs

Dataset

The Dataset classes implemented in MMOCR 1.x all inherits from the BaseDetDataset, which inherits from the BaseDataset in MMEngine. There are several changes of Dataset in MMOCR 1.x.

Data Transforms

The data transforms in MMOCR 1.x all inherits from those in MMCV>=2.0.0rc0, which follows a new convention in OpenMMLab 2.0 projects. The changes are listed as below:

Model

The models in MMOCR 1.x all inherits from BaseModel in MMEngine, which defines a new convention of models in OpenMMLab 2.0 projects. Users can refer to the tutorial of model in MMengine for more details. Accordingly, there are several changes as the following:

Evaluation

MMOCR 1.x mainly implements corresponding metrics for each task, which are manipulated by Evaluator to complete the evaluation. In addition, users can build evaluator in MMOCR 1.x to conduct offline evaluation, i.e., evaluate predictions that may not produced by MMOCR, prediction follows our dataset conventions. More details can be find in the Evaluation Tutorial in MMEngine.

Visualization

The functions of visualization in MMOCR 1.x are removed. Instead, in OpenMMLab 2.0 projects, we use Visualizer to visualize data. MMOCR 1.x implements TextDetLocalVisualizer, TextRecogLocalVisualizer, and KIELocalVisualizer to allow visualization of ground truths, model predictions, and feature maps, etc., at any place, for the three tasks supported in MMOCR. It also supports to dump the visualization data to any external visualization backends such as Tensorboard and Wandb. Check our Visualization Document for more details.

Improvements

Ongoing changes

  1. Test-time augmentation: which was supported in MMOCR 0.x, is not implemented yet in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
  2. Inference interfaces: a unified inference interfaces will be supported in the future to ease the use of released models.
  3. Interfaces of useful tools that can be used in notebook: more useful tools that implemented in the tools/ directory will have their python interfaces so that they can be used through notebook and in downstream libraries.
  4. Documentation: we will add more design docs, tutorials, and migration guidance so that the community can deep dive into our new design, participate the future development, and smoothly migrate downstream libraries to MMOCR 1.x.