IndexError in page.get_links() (original) (raw)

Description of the bug

When the attached file is processed, calling page.get_links() leads to an IndexError for page 14.

How to reproduce the bug

This was traced down to the following lines in src/__init__.py:

    for i, v in enumerate(array.replace("null", "0").split()[1:]):
         t[i] = float(v)

For page 14 the array variable contains

/XYZ 116.00001 745.92 0 34 0 R/XYZ 116.00001 745.92 0 40 0 R/XYZ 116.00001 745.92 0 47 0 R/XYZ 116.00001 745.92 0 56 0 R/XYZ 116.00001 745.92 0 64 0 R/XYZ 116.00001 745.92 0

leading to the following array being enumerated in the loop:

['116.00001', '745.92', '0', '34', '0', 'R/XYZ', '116.00001', '745.92', '0', '40', '0', 'R/XYZ', '116.00001', '745.92', '0', '47', '0', 'R/XYZ', '116.00001', '745.92', '0', '56', '0', 'R/XYZ', '116.00001', '745.92', '0', '64', '0', 'R/XYZ', '116.00001', '745.92', '0']

which leads to the IndexError.

index_error.pdf

PyMuPDF version

Built from source

Operating system

Linux

Python version

3.12