Input Fields Recognition in Documents Using Deep Learning Techniques – REVISTA GEINTEC-GESTAO INOVACAO E TECNOLOGIAS

Volume 11 - Volume 11

Input Fields Recognition in Documents Using Deep Learning Techniques

Abstract

Identification of input fields that appear on a document is a crucial requirement while digitizing any document. This paper presents a Deep Learning based approach to detect input fields from a form or document which consists of text, images and input fields like textbox, checkbox. The forms have been crawled and labelled manually to generate a dataset for training Deep Learning models. The YOLO V3 model is trained on the labelled dataset having four classes (static text, static image, input text, checkbox) with 1500 instances. We used bounding box techniques to label the dataset. The paper presents detection of limited types of input fields generally appearing on printed forms. We also discussed how such detection models can scale and sustain higher loads. If given the labelled dataset for other types of input fields, the existing YOLO V3 can be trained for them as well. The model is trained for 3500 iterations and the accuracy achieved is 71 percent.

Paper Details

PaperID: 2468

Author's Name: Atharv Nagarikar, Rahul Singh Dangi, Samrit Kumar Maity, Ashish Kuvelkar and Sanjay Wandhekar

Volume: Volume 11

Issues: Volume 11

Keywords: Deep Learning, YOLO, OCR, Forms, Document’s Input Fields.

Year: 2021

Month: August

Pages: 4405-4415

Download

REVISTA GEINTEC-GESTAO INOVACAO E TECNOLOGIAS

ISSN:2237-0722