bert - Pretrained BERT model - MATLAB (original) (raw)
Pretrained BERT model
Since R2023b
Syntax
Description
A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.
[[net](#mw%5F21de60af-c915-4832-8313-5341257993f1),[tokenizer](#mw%5F9a71c361-269e-4ce6-950d-f3d819abd288)] = bert
returns a pretrained BERT-Base model and the corresponding tokenizer.
[[net](#mw%5F21de60af-c915-4832-8313-5341257993f1),[tokenizer](#mw%5F9a71c361-269e-4ce6-950d-f3d819abd288)] = bert([Name=Value](#namevaluepairarguments))
specifies additional options using one or more name-value arguments.
Examples
Load Pretrained BERT Neural Network
Load a pretrained BERT-Base neural network and the corresponding tokenizer using the bert
function. If the Text Analytics Toolbox™ Model for BERT-Base Network support package is not installed, then the function provides a link to the required support package in the Add-On Explorer. To install the support package, click the link, and then click Install.
View the network properties.
net = dlnetwork with properties:
Layers: [129x1 nnet.cnn.layer.Layer]
Connections: [164x2 table]
Learnables: [197x3 table]
State: [0x3 table]
InputNames: {'input_ids' 'attention_mask' 'seg_ids'}
OutputNames: {'enc12_layernorm2'}
Initialized: 1
View summary with summary.
View the tokenizer.
tokenizer = bertTokenizer with properties:
IgnoreCase: 1
StripAccents: 1
PaddingToken: "[PAD]"
PaddingCode: 1
StartToken: "[CLS]"
StartCode: 102
UnknownToken: "[UNK]"
UnknownCode: 101
SeparatorToken: "[SEP]"
SeparatorCode: 103
ContextSize: 512
Input Arguments
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: [net,tokenizer] = bert(Model="tiny")
returns a pretrained BERT-Tiny model and the corresponding tokenizer.
Model
— BERT model
"base"
(default) | "tiny"
| "mini"
| "small"
| "large"
| "multilingual"
BERT model, specified as one of these options:
"base"
— BERT-Base model. This option requires theText Analytics Toolbox™ Model for BERT-Base Network support package. This model has 108.8 million learnable parameters."tiny"
— BERT-Tiny model. This option requires theText Analytics Toolbox Model for BERT-Tiny Network support package. This model has 4.3 million learnable parameters."mini"
— BERT-Mini model. This option requires theText Analytics Toolbox Model for BERT-Mini Network support package. This model has 11.1 million learnable parameters."small"
— BERT-Small model. This option requires theText Analytics Toolbox Model for BERT-Small Network support package. This model has 28.5 million learnable parameters."large"
— BERT-Large model. This option requires theText Analytics Toolbox Model for BERT-Large Network support package. This model has 334 million learnable parameters."multilingual"
— BERT-Base multilingual model. This option requires the Text Analytics Toolbox Model for BERT-Base Multilingual Cased Network support package. This model has 177.2 million learnable parameters.
Head
— Model head
"none"
(default) | "document-classifier"
Model head, specified as one of these values:
"document-classifier"
— Return a model with a document classification head. The head contains a fully connected layer with an output size of NumClasses and a softmax layer."none"
— Return a headless model.
NumClasses
— Number of classes for document classification head
2
(default) | positive integer
Number of classes for the document classification head, specified as a positive integer.
This option only applies when Head is"document-classifier"
.
DropoutProbability
— Probability of dropping out input elements in dropout layers
0.1
(default) | scalar in the range [0, 1)
Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).
When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p
, where X
is the layer input and p
is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p)
.
This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
AttentionDropoutProbability
— Probability of dropping out input elements in attention layers
0.1
(default) | scalar in the range [0, 1)
Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).
When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p
, where scores
is the layer input and p
is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p)
.
This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Output Arguments
net
— Pretrained BERT model
dlnetwork
object
Pretrained BERT model, returned as a dlnetwork (Deep Learning Toolbox) object.
tokenizer
— BERT tokenizer
bertTokenizer
object
BERT tokenizer, returned as a bertTokenizer object.
References
[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.
[2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58
[3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks."Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.
Version History
Introduced in R2023b