Image extraction bugs with newer versions (original) (raw)
Describe the bug (mandatory)
In the newer versions 1.22.0 and 1.22.1, it looks like certain image formats like pam are not handled properly anymore
To Reproduce (mandatory)
Running the following leads to different results with versions 1.19.6 and 1.22.0
import fitz
width, height = 13, 37
image = fitz.Pixmap(fitz.csGRAY, width, height, b"\x00" * (width * height), False)
with fitz.Document(stream=image.tobytes(output="pam"), filetype="pam") as doc:
test_pdf_bytes = doc.convert_to_pdf()
with fitz.Document(stream=test_pdf_bytes) as doc:
page = doc[0]
img_xref = page.get_images()[0][0]
img_bytes = doc.extract_image(img_xref)["image"]
print(img_bytes)
fitz.Pixmap(img_bytes)
With 1.19.6, this runs without error and prints
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\r\x00\x00\x00%\x08\x00\x00\x00\x00\xbc\x7f<\xfb\x00\x00\x00\tpHYs\x00\x00\x0e\xc4\x00\x00\x0e\xc4\x01\x95+\x0e\x1b\x00\x00\x00\x11IDATx\x9ccd@\x06\x8c\xa3\xbc\x11\xc9\x03\x00(x\x00&i\xc7\xc3\xfb\x00\x00\x00\x00IEND\xaeB`\x82'
With 1.22.0, the last line raises an error
in Pixmap.__init__(self, *args)
7136 def __init__(self, *args):
7137 """Pixmap(colorspace, irect, alpha) - empty pixmap.
7138 Pixmap(colorspace, src) - copy changing colorspace.
7139 Pixmap(src, width, height,[clip]) - scaled copy, float dimensions.
(...)
7145 Pixmap(PDFdoc, xref) - from an image xref in a PDF document.
7146 """
-> 7148 _fitz.Pixmap_swiginit(self, _fitz.new_Pixmap(*args))
RuntimeError: unknown image file format
and the bytes printed are very different
b'&\xa0\x9f\xff\xff\xff\xff\xff\xff\xff\xff\xe0\x02\x00 '
Your configuration (mandatory)
- Operating system, potentially version and bitness
- Python version, bitness
- PyMuPDF version, installation method (wheel or generated from source).
For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).
> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.13 (main, Sep 8 2022, 09:21:48)
[GCC 9.4.0]
linux
PyMuPDF 1.22.0: Python bindings for the MuPDF 1.22.0 library.
Version date: 2023-04-14 00:00:01.
Built for Python 3.9 on linux (64-bit).
Installed via pip install pymupdf==1.22.0