Extraction of information from bill receipts using optical character recognition (original) (raw)

2020 International Conference on Smart Electronics and Communication (ICOSEC), 2020

Abstract

This paper presents an application of optical character recognition(OCR) which can extract information from images of bills and receipts; store it as machine-processable text; in an organized manner for ease of access. It can do this efficiently even in the presence of watermarks on the bills or any shadows in the images of the bills. In developing this application, OpenCV has been used for the processing of the images and the Tesseract OCR engine has been used for optical character recognition and text extraction. The image is first processed using OpenCV for the removal of any shadows or watermarks present in it. For longer invoices, by employing the image bifurcation process, the data can be easily extracted which was not possible earlier. Furthermore, the processed image is passed on to the Tesseract OCR engine for the retrieving of text present in it. The text is then searched for important information, such as the total amount spent and the date on the receipt, using string pr...

Reena Sonkusare hasn't uploaded this paper.

Let Reena know you want this paper to be uploaded.

Ask for this paper to be uploaded.