AIS Electronic Library (AISeL) - Proceedings of the International Conference on Information Systems Development (ISD): A Lightweight Approach to Table Recognition in Digital Invoices (original) (raw)
Abstract
This study investigates table recognition techniques for digital documents, focusing on the challenges posed by diverse invoice layouts. A comparative evaluation of traditional pattern recognition and deep learning approaches highlighted their respective strengths and limitations. Special attention was given to ProjectionP, a proprietary lightweight method for resource-constrained environments, which combines morphological line extraction with pixel-based thresholding. A modified evaluation procedure adapted from ICDAR2013 was introduced to better balance over- and under-segmentation errors.
Comparative analysis showed that while Camelot leverages PDF metadata, it struggles with visual segmentation. Nanonets achieve high grid detection accuracy but can misplace text in complex tables. ProjectionP, optimized for desktop hardware, delivered competitive results, outperforming Camelot and matching Nanonets in specific cases.
DOWNLOADS
Since November 17, 2025
COinS
A Lightweight Approach to Table Recognition in Digital Invoices
This study investigates table recognition techniques for digital documents, focusing on the challenges posed by diverse invoice layouts. A comparative evaluation of traditional pattern recognition and deep learning approaches highlighted their respective strengths and limitations. Special attention was given to ProjectionP, a proprietary lightweight method for resource-constrained environments, which combines morphological line extraction with pixel-based thresholding. A modified evaluation procedure adapted from ICDAR2013 was introduced to better balance over- and under-segmentation errors.
Comparative analysis showed that while Camelot leverages PDF metadata, it struggles with visual segmentation. Nanonets achieve high grid detection accuracy but can misplace text in complex tables. ProjectionP, optimized for desktop hardware, delivered competitive results, outperforming Camelot and matching Nanonets in specific cases.