bert - Pretrained BERT model - MATLAB (original) (raw)

Pretrained BERT model

Since R2023b

Syntax

Description

A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.

[[net](#mw%5F21de60af-c915-4832-8313-5341257993f1),[tokenizer](#mw%5F9a71c361-269e-4ce6-950d-f3d819abd288)] = bert returns a pretrained BERT-Base model and the corresponding tokenizer.

example

[[net](#mw%5F21de60af-c915-4832-8313-5341257993f1),[tokenizer](#mw%5F9a71c361-269e-4ce6-950d-f3d819abd288)] = bert([Name=Value](#namevaluepairarguments)) specifies additional options using one or more name-value arguments.

Examples

collapse all

Load Pretrained BERT Neural Network

Load a pretrained BERT-Base neural network and the corresponding tokenizer using the bert function. If the Text Analytics Toolbox™ Model for BERT-Base Network support package is not installed, then the function provides a link to the required support package in the Add-On Explorer. To install the support package, click the link, and then click Install.

View the network properties.

net = dlnetwork with properties:

     Layers: [129x1 nnet.cnn.layer.Layer]
Connections: [164x2 table]
 Learnables: [197x3 table]
      State: [0x3 table]
 InputNames: {'input_ids'  'attention_mask'  'seg_ids'}
OutputNames: {'enc12_layernorm2'}
Initialized: 1

View summary with summary.

View the tokenizer.

tokenizer = bertTokenizer with properties:

    IgnoreCase: 1
  StripAccents: 1
  PaddingToken: "[PAD]"
   PaddingCode: 1
    StartToken: "[CLS]"
     StartCode: 102
  UnknownToken: "[UNK]"
   UnknownCode: 101
SeparatorToken: "[SEP]"
 SeparatorCode: 103
   ContextSize: 512

Input Arguments

collapse all

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: [net,tokenizer] = bert(Model="tiny") returns a pretrained BERT-Tiny model and the corresponding tokenizer.

`Model` — BERT model

BERT model, specified as one of these options:

"base" — BERT-Base model. This option requires theText Analytics Toolbox™ Model for BERT-Base Network support package. This model has 108.8 million learnable parameters.
"tiny" — BERT-Tiny model. This option requires theText Analytics Toolbox Model for BERT-Tiny Network support package. This model has 4.3 million learnable parameters.
"mini" — BERT-Mini model. This option requires theText Analytics Toolbox Model for BERT-Mini Network support package. This model has 11.1 million learnable parameters.
"small" — BERT-Small model. This option requires theText Analytics Toolbox Model for BERT-Small Network support package. This model has 28.5 million learnable parameters.
"large" — BERT-Large model. This option requires theText Analytics Toolbox Model for BERT-Large Network support package. This model has 334 million learnable parameters.
"multilingual" — BERT-Base multilingual model. This option requires the Text Analytics Toolbox Model for BERT-Base Multilingual Cased Network support package. This model has 177.2 million learnable parameters.

`Head` — Model head

"none" (default) | "document-classifier"

Model head, specified as one of these values:

"document-classifier" — Return a model with a document classification head. The head contains a fully connected layer with an output size of NumClasses and a softmax layer.
"none" — Return a headless model.

`NumClasses` — Number of classes for document classification head

2 (default) | positive integer

Number of classes for the document classification head, specified as a positive integer.

This option only applies when Head is"document-classifier".

`DropoutProbability` — Probability of dropping out input elements in dropout layers

0.1 (default) | scalar in the range [0, 1)

Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).

When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p, where X is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

`AttentionDropoutProbability` — Probability of dropping out input elements in attention layers

0.1 (default) | scalar in the range [0, 1)

Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).

When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p, where scores is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

Output Arguments

collapse all

`net` — Pretrained BERT model

dlnetwork object

Pretrained BERT model, returned as a dlnetwork (Deep Learning Toolbox) object.

`tokenizer` — BERT tokenizer

bertTokenizer object

BERT tokenizer, returned as a bertTokenizer object.

References

[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.

[2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58

[3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks."Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.

Version History

Introduced in R2023b

bert - Pretrained BERT model - MATLAB (original) (raw)

Syntax

Description

Examples

Load Pretrained BERT Neural Network

Input Arguments

Name-Value Arguments

Model — BERT model

Head — Model head

NumClasses — Number of classes for document classification head

DropoutProbability — Probability of dropping out input elements in dropout layers

AttentionDropoutProbability — Probability of dropping out input elements in attention layers

Output Arguments

net — Pretrained BERT model

tokenizer — BERT tokenizer

References

Version History

`Model` — BERT model

`Head` — Model head

`NumClasses` — Number of classes for document classification head

`DropoutProbability` — Probability of dropping out input elements in dropout layers

`AttentionDropoutProbability` — Probability of dropping out input elements in attention layers

`net` — Pretrained BERT model

`tokenizer` — BERT tokenizer