Textual Information Localization and Retrieval in Document Images Based on Quadtree Decomposition (original) (raw)

Abstract

Textual information extraction is a challenging issue in Information Retrieval. Two main approaches are commonly distinguished: texture-based and region-based. In this paper, we propose a method guided by the quadtree decomposition. The principle of the method is to recursively decompose regions of a document image is four equal regions, starting from the image of the whole document. At each step of the decomposition process an OCR engine is used for retrieving a given textual information from the obtained regions. Experiments on real invoice data provide promising results.

Similar content being viewed by others

References

Download references

Author information

Authors and Affiliations

  1. EA2525-LIM, University of Reunion Island, Ile de La Reunion, Saint Denis, France
    Cynthia Pitou & Jean Diatta

Authors

  1. Cynthia Pitou
  2. Jean Diatta

Corresponding author

Correspondence toCynthia Pitou .

Editor information

Editors and Affiliations

  1. Jacobs University Bremen , Bremen, Germany
    Adalbert F.X. Wilhelm
  2. Universität Ulm, Institute of Medical Systems Biology Universität Ulm, Ulm, Baden-Württemberg, Germany
    Hans A. Kestler

Rights and permissions

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Pitou, C., Diatta, J. (2016). Textual Information Localization and Retrieval in Document Images Based on Quadtree Decomposition. In: Wilhelm, A., Kestler, H. (eds) Analysis of Large and Complex Data. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-25226-1\_6

Download citation

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Publish with us