Memory limit for pdfalto subprocess in Grobid server not working with docker image (original) (raw)
- What is your OS and architecture? Windows is not supported and Mac OS arm64 is not yet supported. For non-supported OS, you can use Docker (https://grobid.readthedocs.io/en/latest/Grobid-docker/) ---- Amazon Linux
- What is your Java version (
java --version)? ---- Reproduced using the docker images 0.7.3, 0.7.2
So we have a long run Grobid server process we run with Xmx18G. What we notice is that processing one batch of ~1000 pdfs consumes 7-10GB, but then processing the 2nd batch of ~1000 pdfs consumes another 7-10GB and eventually the server gets killed with OOM.
This is a consistent finding where the server keeps consuming more and more memory and needs to be restarted. Is there possibly a memory leak? Are there any knobs / workarounds we can play with?