alleninstituteforai/olmocr - Docker Image (original) (raw)

olmOCR Logo

A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

A toolkit for converting PDFs and other image-based document formats into clean, readable, plain text format.

Try the online demo: https://olmocr.allenai.org/⁠

Features:

Description

This Docker image contains the olmOCR package. It provides a complete environment for document processing, OCR tasks, and text recognition with all dependencies pre-installed.

Features

Usage

Pull the image
docker pull alleninstituteforai/olmocr:latest
Run with GPU support
docker run --gpus all -it alleninstituteforai/olmocr:latest
Mount local directories
docker run --gpus all -v /path/to/your/data:/data -it alleninstituteforai/olmocr:latest
Run specific commands
docker run --gpus all -it alleninstituteforai/olmocr:latest python -m olmocr.any_module

Package Information

This image contains the olmOCR package which requires Python 3.11 or higher and includes dependencies for document processing, PDF handling, image manipulation, and machine learning tasks.

Source Code

Source code for olmOCR is available on GitHub:https://github.com/allenai/olmocr⁠

License

Apache License 2.0