alleninstituteforai/olmocr - Docker Image (original) (raw)

olmOCR Logo

⁠A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

Try the online demo: https://olmocr.allenai.org/⁠

Features:

Convert PDF, PNG, and JPEG based documents into clean Markdown
Support for equations, tables, handwriting, and complex formatting
Automatically removes headers and footers
Convert into text with a natural reading order, even in the presence of figures, multi-column layouts, and insets
Efficient, less than $200 USD per million pages converted
(Based on a 7B parameter VLM, so it requires a GPU)

⁠Description

This Docker image contains the olmOCR package. It provides a complete environment for document processing, OCR tasks, and text recognition with all dependencies pre-installed.

⁠Features

Built on NVIDIA CUDA 11.8.0 with cuDNN support
Python 3.11 environment with full GPU acceleration
Below dependencies installed:
- gpu: Support for GPU-accelerated processing
- bench: Development tools for benchmark

⁠Usage

⁠Pull the image

docker pull alleninstituteforai/olmocr:latest

⁠Run with GPU support

docker run --gpus all -it alleninstituteforai/olmocr:latest

⁠Mount local directories

docker run --gpus all -v /path/to/your/data:/data -it alleninstituteforai/olmocr:latest

⁠Run specific commands

docker run --gpus all -it alleninstituteforai/olmocr:latest python -m olmocr.any_module

⁠Package Information

This image contains the olmOCR package which requires Python 3.11 or higher and includes dependencies for document processing, PDF handling, image manipulation, and machine learning tasks.

⁠Source Code

Source code for olmOCR is available on GitHub:https://github.com/allenai/olmocr⁠

⁠License

Apache License 2.0