open stream can raise a FzErrorFormat error instead of FileDataError (original) (raw)

Description of the bug

If I feed a .csv file to pymupdf.open, I get an FileDataError, as documented:

If you attempt to open an unsupported file then PyMuPDF will throw a file data error.

But if I instead pass the bytes of the same file to stream= I get an FzErrorFormat, which I was not expecting from the docs.

How to reproduce the bug

with open('myfile.csv', 'rb') as f:
    file_bytes = f.read()

It probably doesn't matter what's in csv but here's mine:

>> file_bytes
b'A,B,C,D\r\n1,2,1,2\r\n2,2,1,2\r\n'

Now we try to open this:

>> pymupdf.open(stream=file_bytes)
---------------------------------------------------------------------------
FzErrorFormat                             Traceback (most recent call last)
<ipython-input-21-668e9798a921> in ?()
----> 1 pymupdf.open(stream=file_bytes)

~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2884                     self.page_count2 = extra.page_count_pdf
   2885                 else:
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(magic, stream)
  44292 
  44293         NOTE: The caller retains ownership of 'stream' - the document will take its
  44294         own reference if required.
  44295     """
> 44296     return _mupdf.fz_open_document_with_stream(magic, stream)

FzErrorFormat: code=7: no objects found

Contrast this with what happens when I open the file directly:

pymupdf.open("myfile.csv")
---------------------------------------------------------------------------
FzErrorUnsupported                        Traceback (most recent call last)
~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

~/.local/lib/python3.12/site-packages/pymupdf/mupdf.py in ?(filename)
  44271         filename: a path to a file as it would be given to open(2).
  44272     """
> 44273     return _mupdf.fz_open_document(filename)

FzErrorUnsupported: code=6: cannot find document handler for file: myfile.csv

The above exception was the direct cause of the following exception:

FileDataError                             Traceback (most recent call last)
<ipython-input-22-b19d9e4e2772> in ?()
----> 1 pymupdf.open("myfile.csv")

~/.local/lib/python3.12/site-packages/pymupdf/__init__.py in ?(self, filename, stream, filetype, rect, width, height, fontsize)
   2884                     self.page_count2 = extra.page_count_pdf
   2885                 else:
   2886                     self.page_count2 = extra.page_count_fz
   2887         finally:
-> 2888             JM_mupdf_show_errors = JM_mupdf_show_errors_old

FileDataError: Failed to open file 'myfile.csv'.

(we can see it still fails with FzErrorUnSupported but this ultimately raises FileDataError as documented).

PyMuPDF version

1.24.10

Operating system

Linux

Python version

3.12