add_highlight_annot using clip generates "A Number is Out of Range" error in PDF (original) (raw)
Describe the bug (mandatory)
I am trying use page.add_highlight_annot with the clip option, and while the highlighting is placed as expected, the resulting PDF contains "A Number is Out of Range" error. The clip is built from information within the results of page.get_text("words", textpage=textpage) so I am not sure how my clip could be illegal. If this is not a bug, what I am doing wrong?
To Reproduce (mandatory)
Import Fitz and read PDF
import fitz
pdfDoc = fitz.open('./test.pdf')
Get Text and Do Text Stuff with it (here we find the index of target)
page = pdfDoc[0]
textpage = page.get_textpage(clip=page.mediabox)
page_text_words = page.get_text("words", textpage=textpage)
# xi = list index of key word
for xi, x in enumerate(page_text_words):
if x[4]=='pellentesque,':
target_idx = xi
print(xi, x)
# results seem reasonable
# 88 (242.8954315185547, 157.6929473876953, 308.8439025878906, 172.07373046875, 'pellentesque,', 3, 6, 5)
Get context around this target using the list index
context_span = 10
start_idx = target_idx-context_span
end_idx = target_idx+context_span
context_text = " ".join([x[4] for x in page_text_words[start_idx:(end_idx+1)] ])
# results seem reasonable
# 'et maximus urna. Nullam posuere feugiat orci non ullamcorper. Proin pellentesque, odio id facilisis mollis, sem risus suscipit ex, non aliquet'
Build a clip for this context text
clip_rect = list(page_text_words[start_idx][:4])
for xi, x in enumerate(page_text_words[start_idx:(end_idx+1)]):
if x[0]<clip_rect[0]:
clip_rect[0]=x[0]
if x[1]<clip_rect[1]:
clip_rect[1]=x[1]
if x[2]>clip_rect[2]:
clip_rect[2]=x[2]
if x[3]>clip_rect[3]:
clip_rect[3]=x[3]
# results seem reasonable, even though the method is pretty ugly
# [72.02400207519531, 143.41297912597656, 540.11474609375, 186.353759765625]
Use the clip to add a highlight annotation
x0,y0,x1,y1 = clip_rect
rect = fitz.Rect(x0,y0,x1,y1)
highlight = page.add_highlight_annot(quads=None, clip=rect)
highlight.update()
Save PDF
pdfDoc.save(f"./test_out.pdf", garbage=4, clean=True, deflate=True, deflate_images=True, deflate_fonts=True)
print(f"Info: Saved Annotated PDF ./test_out.pdf")
# opening this PDF shows the highlighting as expected but also pops up "A Number is Out of Range" error
Expected behavior (optional)
I would expect not to get the "A Number is Out of Range" error
Screenshots (optional)
At first the highlighting doesn't show, only the error. But once you click 'Ok' the error goes away and highlighting shows. Any scrolling brings the error prompt back up.
After clicking 'Ok' and followed by and scrolling
Your configuration (mandatory)
3.10.8 (tags/v3.10.8:aaaf517, Oct 11 2022, 16:50:30) [MSC v.1933 64 bit (AMD64)]
win32
PyMuPDF 1.21.0: Python bindings for the MuPDF 1.21.0 library.
Version date: 2022-11-08 00:00:01.
Built for Python 3.10 on win32 (64-bit).
Additional context (optional)
As always, thank you for the support!