New 'rebased' implementation of PyMuPDF · pymupdf/PyMuPDF · Discussion #2680 (original) (raw)

New 'rebased' implementation of PyMuPDF

Overview

We are migrating PyMuPDF to a new 'rebased' implementation which uses the MuPDF Python and C++ APIs, instead of the MuPDF C API used by the original 'classic' implementation of PyMuPDF.

The rebased implementation will behave identically to the classic implementation, and will not require any changes to user code.

Advantages of the rebased implementation compared to classic

User access to the underlying MuPDF Python API.
The MuPDF Python API will be available as fitz.mupdf - this is not possible with classic PyMuPDF, and can give useful flexibility to the user.
Simplified implementation.
The underlying MuPDF C++/Python APIs use automated reference counting, automatic contexts, and native C++ and Python exceptions, and this makes the rebased implementation simpler than the classic implementation.
This also helps development of new PyMuPDF functionality.
Optional tracing of MuPDF C function calls using environment variables.
This is a feature of the MuPDF C++ and Python APIs, and can be very useful during development and when reporting bugs.
Possible future support for multithreaded use.
The classic implementation of PyMuPDF is explicitly single-threaded, but the MuPDF C++/Python APIs have full support for threads with automated per-thread contexts.

Migration to the rebased implementation

We will migrate to the new rebased implementation in the following stages:

Stage 1: one or more releases containing two modules, fitz and fitz_new.
- Default from import fitz is the classic implementation.
- You can try the rebased implementation with import fitz_new as fitz (no other changes to your code are needed).
- PyMuPDF-1.23.3 is the first release of stage 1.
Stage 2: one or more releases containing two modules, fitz and fitz_old.
- Default from import fitz is the rebased implementation.
- Force use of the classic implementation with import fitz_old as fitz.
Stage 3: subsequent releases will have module fitz only.
- Default from import fitz is the rebased implementation.
- The classic implementation is not available.

During stage 1 we would like you to try out the rebased implementation by using import fitz_new as fitz, and report any issues you come across.

When the rebased implementation is thought to work as well as the classic implementation, we will move to stage 2, where users will get the rebased implementation by default. If users come across problems with the rebased implementation in stage 2, they can revert to the classic implementation by using import fitz_old as fitz. It is important that users report any such problems so we can fix the rebased implementation.

Finally when we are fully confident that the rebased implementation is working for all users, we will move to stage 3, where only the rebased implementation will be available.

Impact on users

Users of PyMuPDF will be able to carry on using PyMuPDF throughout the migration without making any changes to their code.

The rebased implementation passes PyMuPDF's test suite, but of course this doesn't check everything, so it is possible that some users will come across issues during the migration, especially at stage 2 where the rebased implementation becomes the default.

The best way to protect against this happening is to try out the rebased implementation in stage 1 by using import fitz_new as fitz, and report any problems so they can be fixed.