alleninstituteforai/olmocr - Docker Image (original) (raw)
A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.
A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.
Try the online demo: https://olmocr.allenai.org/
Features:
- Convert PDF, PNG, and JPEG based documents into clean Markdown
- Support for equations, tables, handwriting, and complex formatting
- Automatically removes headers and footers
- Convert into text with a natural reading order, even in the presence of figures, multi-column layouts, and insets
- Efficient, less than $200 USD per million pages converted
- (Based on a 7B parameter VLM, so it requires a GPU)
Description
This Docker image contains the olmOCR package. It provides a complete environment for document processing, OCR tasks, and text recognition with all dependencies pre-installed.
Features
- Built on NVIDIA CUDA 11.8.0 with cuDNN support
- Python 3.11 environment with full GPU acceleration
- Below dependencies installed:
gpu: Support for GPU-accelerated processingbench: Development tools for benchmark
Usage
Pull the image
docker pull alleninstituteforai/olmocr:latest
Run with GPU support
docker run --gpus all -it alleninstituteforai/olmocr:latest
Mount local directories
docker run --gpus all -v /path/to/your/data:/data -it alleninstituteforai/olmocr:latest
Run specific commands
docker run --gpus all -it alleninstituteforai/olmocr:latest python -m olmocr.any_module
Package Information
This image contains the olmOCR package which requires Python 3.11 or higher and includes dependencies for document processing, PDF handling, image manipulation, and machine learning tasks.
Source Code
Source code for olmOCR is available on GitHub:https://github.com/allenai/olmocr
License
Apache License 2.0