[PDFBOX-3442] OOM for single page pdf file (original) (raw)

On TIKA-2045, a user posted a single page document that leads to OOM with -Xmx1g. I confirmed this with PDFBox's ExtractText.

Might be a memory leak with the fonts? See this for some diagnostics I did.

  1. res.diff
    31/Jul/16 10:48
    2 kB
  2. res2.diff
    04/Aug/16 10:41
    5 kB
    Tilman Hausherr
  3. res3.diff
    05/Aug/16 16:32
    2 kB
    Tilman Hausherr

duplicates

Bug - A problem which impairs or prevents the functions of the product. PDFBOX-3503 2.0 much slower than 1.8 for text extraction with certain PDF files

is depended upon by

Bug - A problem which impairs or prevents the functions of the product. TIKA-2045 TIKA crashes / runs out of memory on simple PDF