page.rect and text location wrong / differing from older version (original) (raw)

Describe the bug (mandatory)

page.rect and the bbox of text seems to be wrong under certain circumstances. At the least, it differs from the result obtained using an older version (1.16.18).

To Reproduce (mandatory)

Sample.pdf
The sample pdf has the following boxes (extracted from the pdf content stream):
/CropBox [ 30 30 565.32 811.92 ]
/MediaBox [ 0 0 595.32 841.92 ]

Using

doc = fitz.Document("Sample.pdf")
page = doc.loadPage(0)
page.cropbox
page.mediabox
page.rect

this results in:
page.cropbox -> Rect(30.0, 30.0, 565.32, 811.92)
page.mediabox -> Rect(0.0, 0.0, 595.32, 841.92)
page.rect -> Rect(0.0, 0.0, 595.3200073242188, 841.9199829101562)

The page.rect is too large and text is misplaced. The text in the sample pdf (starting at about the left border of the page) has a bbox (30.787235260009766, ...)

Expected behavior (optional)

I expect the results that PyMuPdf 1.16.18 delivered and that look correct:
page.CropBox -> Rect(30.0, 30.0, 565.3200073242188, 811.9199829101562)
page.MediaBox -> Rect(0.0, 0.0, 595.3200073242188, 841.9199829101562)
page.rect -> Rect(0.0, 0.0, 535.3200073242188, 781.9199829101562)

The lower x dimension of the bbox of the text should be almost zero - about bbox = (0.78, ...)

Your configuration (mandatory)

Additional context (optional)

Is this new behaviour (compared to the old version) expected? If yes, is there any flag/option/workaround to get results comparable to v1.16.18 in the current v1.23.4?

Thanks in advance!