pdfalto error: Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap (original) (raw)
Hi,
I'm getting following error with certain pdf:
ERROR [2022-06-07 08:02:33,838] org.grobid.core.process.ProcessPdfToXml: pdfalto process finished with error code: 143. [/opt/grobid/grobid-home/pdfalto/lin-64/pdfalto_server, -fullFontName, -noLineNumbers, -noImage, -annotation, -filesLimit, 2000, /opt/grobid/grobid-home/tmp/origin3690432459378499723.pdf, /opt/grobid/grobid-home/tmp/czDhswmAVc.lxml]
ERROR [2022-06-07 08:02:33,838] org.grobid.core.process.ProcessPdfToXml: pdfalto return message:
Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap
Syntax Warning: Invalid entry in bfchar block in ToUnicode CMap
... LOT of these lines
This is the problematic PDF:
https://jyx.jyu.fi/bitstream/handle/123456789/81469/978-951-39-9321-4_vaitos10062022.pdf?sequence=1&isAllowed=y
Its a dissertation with multiple articles in it.
I'm calling grobid with httpie like this:http -f POST :8070/api/processReferences input@'./978-951-39-9321-4_vaitos10062022.pdf;type=application/pdf'
Same problem also happens via web UI.
OS: Debian 11
Grobid version: 0.7.1 (Docker image)
Any clues what might be causing this?