PDF Table Extraction for Humans — Camelot 2.0.0 documentation (original) (raw)

Camelot: PDF Table Extraction for Humans#

Release v2.0.0. (Installation)

Documentation Status https://codecov.io/github/camelot-dev/camelot/badge.svg?branch=master&service=github https://img.shields.io/pypi/v/camelot-py.svg https://img.shields.io/pypi/l/camelot-py.svg https://img.shields.io/pypi/pyversions/camelot-py.svg

Camelot is a Python library that can help you extract tables from PDFs.

Note

You can also check out Excalibur, the web interface to Camelot.


Extract tables from PDFs in just a few lines of code:

Try it yourself in our interactive quickstart notebook. colab

Or check out a simple example using this pdf.

import camelot tables = camelot.read_pdf('foo.pdf') tables tables.export('foo.csv', f='csv', compress=True) # json, excel, html, markdown, sqlite tables[0] <Table shape=(7, 7)> tables[0].parsing_report { 'accuracy': 99.02, 'whitespace': 12.24, 'order': 1, 'page': 1 } tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite tables[0].df # get a pandas DataFrame!

Camelot also comes packaged with a command-line interface!

Note

Camelot only works with text-based PDFs and not scanned documents. (As Tabula explains, “If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based”.)

You can check out some frequently asked questions here.

Why Camelot?#

See comparison with similar libraries and tools.

The User Guide#

This part of the documentation begins with some background information about why Camelot was created, takes you through some implementation details, and then focuses on step-by-step instructions for getting the most out of Camelot.

The API Documentation/Guide#

If you are looking for information on a specific function, class, or method, this part of the documentation is for you.

The Contributor Guide#

If you want to contribute to the project, this part of the documentation is for you.