Features Comparison - PyMuPDF 1.26.0 documentation (original) (raw)

Toggle table of contents sidebar

Feature Matrix#

The following table illustrates how PyMuPDF compares with other typical solutions.

_images/icon-pdf.svg _images/icon-svg.svg _images/icon-xps.svg _images/icon-cbz.svg _images/icon-mobi.svg _images/icon-epub.svg _images/icon-image.svg _images/icon-fb2.svg _images/icon-txt.svg _images/icon-docx.svg _images/icon-pptx.svg _images/icon-xlsx.svg _images/icon-hangul.svg

Feature PyMuPDF pikepdf PyPDF2 pdfrw pdfplumber / pdfminer
Supports Multiple Document Formats PDF XPS EPUB MOBI FB2 CBZ SVG TXT Image DOCX XLSX PPTX HWPX See note PDF PDF PDF PDF
Implementation Python and C Python and C++ Python Python Python
Render Document Pages All document types No rendering No rendering No rendering No rendering
Write Text to PDF Page See:Page.insert_htmlbox or: Page.insert_textbox or: TextWriter
Supports CJK characters
Extract Text All document types PDF only PDF only
Extract Text as Markdown (.md) All document types
Extract Tables All document types PDF only
Extract Vector Graphics All document types Limited
Draw Vector Graphics (PDF)
Based on Existing, Mature Library MuPDF QPDF
Automatic Repair of Damaged PDFs
Encrypted PDFs Limited Limited
Linerarized PDFs
Incremental Updates
Integrates with Jupyter and IPython Notebooks
Joining / Merging PDF with other Document Types All document types PDF only PDF only PDF only PDF only
OCR API for Seamless Integration with Tesseract All document types
Integrated Checkpoint / Restart Feature (PDF)
PDF Optional Content
PDF Embedded Files Limited Limited
PDF Redactions
PDF Annotations Full Limited
PDF Form Fields Create, read, update Limited, no creation
PDF Page Labels Read-only
Support Font Sub-Setting

_images/icon-docx.svg _images/icon-xlsx.svg _images/icon-pptx.svg _images/icon-hangul.svg

Note

A note about Office document types (DOCX, XLXS, PPTX) and Hangul documents (HWPX). These documents can be loaded into PyMuPDF and you will receive a Document object.

There are some caveats:

When saving out the result any faithful representation of the original layout cannot be expected.

Therefore input files are mostly in a form that’s useful for text extraction.


Performance#

To benchmark PyMuPDF performance against a range of tasks a test suite with a fixed set of 8 PDFs with a total of 7,031 pages containing text & images is used to obtain performance timings.

Here are current results, grouped by task:

Copying

This refers to opening a document and then saving it to a new file. This test measures the speed of reading a PDF and re-writing as a new PDF. This process is also at the core of functions like merging / joining multiple documents. The numbers below therefore apply to PDF joining and merging.

The results for all 7,031 pages are:

PyMuPDF

PDFrw

PikePDF

PyPDF2

Text Extraction

This refers to extracting simple, plain text from every page of the document and storing it in a text file.

The results for all 7,031 pages are:

PyMuPDF

XPDF

PyPDF2

PDFMiner

Rendering

This refers to making an image (like PNG) from every page of a document at a given DPI resolution. This feature is the basis for displaying a document in a GUI window.

The results for all 7,031 pages are:

PyMuPDF and MuPDF are now available under both, open-source AGPL and commercial license agreements. Please read the full text of the AGPL license agreement, available in the distribution material (file COPYING) and on the GNU license page, to ensure that your use case complies with the guidelines of the license. If you determine you cannot meet the requirements of the AGPL, please contact Artifex for more information regarding a commercial license.

Artifex is the exclusive commercial licensing agent for MuPDF.

Artifex, the Artifex logo, MuPDF, and the MuPDF logo are registered trademarks of Artifex Software Inc.


This documentation covers PyMuPDF v1.26.0 features as of 2025-05-22 00:00:01.

The major and minor versions of PyMuPDF and MuPDF will always be the same. Only the third qualifier (patch level) may deviate from that of MuPDF.

Typically PyMuPDF is released more frequently than MuPDF so it will often be the case that the patch level of PyMuPDF will be greater than the embedded MuPDF.

For example PyMuPDF-1.24.5 contains MuPDF-1.24.2.

Also see pymupdf_version and mupdf_version.