Image extraction bugs with newer versions (original) (raw)

Describe the bug (mandatory)

In the newer versions 1.22.0 and 1.22.1, it looks like certain image formats like pam are not handled properly anymore

To Reproduce (mandatory)

Running the following leads to different results with versions 1.19.6 and 1.22.0

import fitz


width, height = 13, 37
image = fitz.Pixmap(fitz.csGRAY, width, height, b"\x00" * (width * height), False)

with fitz.Document(stream=image.tobytes(output="pam"), filetype="pam") as doc:
    test_pdf_bytes = doc.convert_to_pdf()

with fitz.Document(stream=test_pdf_bytes) as doc:
    page = doc[0]
    img_xref = page.get_images()[0][0]
    img_bytes = doc.extract_image(img_xref)["image"]
    print(img_bytes)
    fitz.Pixmap(img_bytes)

With 1.19.6, this runs without error and prints

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\r\x00\x00\x00%\x08\x00\x00\x00\x00\xbc\x7f<\xfb\x00\x00\x00\tpHYs\x00\x00\x0e\xc4\x00\x00\x0e\xc4\x01\x95+\x0e\x1b\x00\x00\x00\x11IDATx\x9ccd@\x06\x8c\xa3\xbc\x11\xc9\x03\x00(x\x00&i\xc7\xc3\xfb\x00\x00\x00\x00IEND\xaeB`\x82'

With 1.22.0, the last line raises an error

in Pixmap.__init__(self, *args)
   7136 def __init__(self, *args):
   7137     """Pixmap(colorspace, irect, alpha) - empty pixmap.
   7138     Pixmap(colorspace, src) - copy changing colorspace.
   7139     Pixmap(src, width, height,[clip]) - scaled copy, float dimensions.
   (...)
   7145     Pixmap(PDFdoc, xref) - from an image xref in a PDF document.
   7146     """
-> 7148     _fitz.Pixmap_swiginit(self, _fitz.new_Pixmap(*args))

RuntimeError: unknown image file format

and the bytes printed are very different

b'&\xa0\x9f\xff\xff\xff\xff\xff\xff\xff\xff\xe0\x02\x00 '

Your configuration (mandatory)

For example, the output of print(sys.version, "\n", sys.platform, "\n", fitz.__doc__) would be sufficient (for the first two bullets).

> print(sys.version, "\n", sys.platform, "\n", fitz.__doc__)
3.9.13 (main, Sep  8 2022, 09:21:48)
[GCC 9.4.0]
 linux

PyMuPDF 1.22.0: Python bindings for the MuPDF 1.22.0 library.
Version date: 2023-04-14 00:00:01.
Built for Python 3.9 on linux (64-bit).

Installed via pip install pymupdf==1.22.0