LayoutLM (original) (raw)

The LayoutLM model was proposed in the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. It’s a simple but effective pretraining method of text and layout for document image understanding and information extraction tasks, such as form understanding and receipt understanding. It obtains state-of-the-art results on several downstream tasks:

Pre-training techniques have been verified successfully in a variety of NLP tasks in recent years. Despite the widespread use of pretraining models for NLP applications, they almost exclusively focus on text-level manipulation, while neglecting layout and style information that is vital for document image understanding. In this paper, we propose the LayoutLM to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents. Furthermore, we also leverage image features to incorporate words’ visual information into LayoutLM. To the best of our knowledge, this is the first time that text and layout are jointly learned in a single framework for document-level pretraining. It achieves new state-of-the-art results in several downstream tasks, including form understanding (from 70.72 to 79.27), receipt understanding (from 94.02 to 95.24) and document image classification (from 93.07 to 94.42).

def normalize_bbox(bbox, width, height): return [ int(1000 * (bbox[0] / width)), int(1000 * (bbox[1] / height)), int(1000 * (bbox[2] / width)), int(1000 * (bbox[3] / height)), ]

Here, width and height correspond to the width and height of the original document in which the token occurs. Those can be obtained using the Python Image Library (PIL) library for example, as follows:

A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with LayoutLM. If you’re interested in submitting a resource to be included here, please feel free to open a Pull Request and we’ll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

LayoutLM (original) (raw)

class transformers.LayoutLMConfig

class transformers.LayoutLMTokenizer

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

get_special_tokens_mask

class transformers.LayoutLMTokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

LayoutLMModel

class transformers.LayoutLMModel

forward

LayoutLMForMaskedLM

class transformers.LayoutLMForMaskedLM

forward

LayoutLMForSequenceClassification

class transformers.LayoutLMForSequenceClassification

forward

LayoutLMForTokenClassification

class transformers.LayoutLMForTokenClassification

forward

LayoutLMForQuestionAnswering

class transformers.LayoutLMForQuestionAnswering

forward

TFLayoutLMModel

class transformers.TFLayoutLMModel

call

TFLayoutLMForMaskedLM

class transformers.TFLayoutLMForMaskedLM

call

TFLayoutLMForSequenceClassification

class transformers.TFLayoutLMForSequenceClassification

call

TFLayoutLMForTokenClassification

class transformers.TFLayoutLMForTokenClassification

call

TFLayoutLMForQuestionAnswering

class transformers.TFLayoutLMForQuestionAnswering

call