Installation - PDF4LLM (original) (raw)

Requirements

PyMuPDF4LLM requires Python 3.8+. It is built on top of PyMuPDF, which is installed automatically as a dependency.


Basic Installation

Install PyMuPDF4LLM from PyPI using pip:

This gives you full access to Markdown, JSON, and plain text extraction from document files.


Optional Dependencies

OCR Support

Enables automatic Optical Character Recognition for PDFs containing scanned or image-based content. Tesseract is included by default. Support for Rapid OCR and Paddle OCR is also available as optional OCR engines and should be installed if required.


Verify Your Installation

import pymupdf4llm

print(pymupdf4llm.version)

Next Steps