Handle incompleted/missclassified tables and figures by lfoppiano · Pull Request #1207 · grobidOrg/grobid (original) (raw)

This PR fix dropping on the floor some text that has been mis-classified as <table> or <figure> by the fulltext model, but that is considered invalid at a second stage.

The approach is to revert invalid table/figure to paragraph directly in the fulltext sequence, instead of removing them from the output.

PR #963 might solve the problem on the long term, however this PR address the short-term timeline.