Installation - PDF4LLM (original) (raw)
Requirements
PyMuPDF4LLM requires Python 3.8+. It is built on top of PyMuPDF, which is installed automatically as a dependency.
Basic Installation
Install PyMuPDF4LLM from PyPI using pip:
This gives you full access to Markdown, JSON, and plain text extraction from document files.
Optional Dependencies
OCR Support
Enables automatic Optical Character Recognition for PDFs containing scanned or image-based content. Tesseract is included by default. Support for Rapid OCR and Paddle OCR is also available as optional OCR engines and should be installed if required.
Verify Your Installation
import pymupdf4llm
print(pymupdf4llm.version)