bertDocumentClassifier - BERT document classifier - MATLAB (original) (raw)
BERT document classifier
Since R2023b
Description
A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.
Creation
Description
`mdl` = bertDocumentClassifier
creates abertDocumentClassifier
object.
`mdl` = bertDocumentClassifier([net](#mw%5Ff27a1d0f-37a1-4dec-b3ff-ba5227992104),[tokenizer](#mw%5Fb4298fe2-ab25-4454-9618-ad2aff148bce))
creates a bertDocumentClassifier
object from the specified BERT neural network and tokenizer.
`mdl` = bertDocumentClassifier(___,[Name=Value](#namevaluepairarguments))
sets the ClassNames property and additional options using one or more name-value arguments.
Input Arguments
net
— BERT neural network
dlnetwork
object
BERT neural network, specified as a dlnetwork (Deep Learning Toolbox) object.
If you specify the net
argument, then you must not specify the Model
argument. The network must have three sequence input layers with input sizes of one. The output size of the network must match the number of classes in the ClassNames
property. The inputs innet.InputNames(1)
, net.InputNames(2)
, andnet.InputNames(3)
must be the inputs for the input data, the attention mask, and the segments, respectively.
tokenizer
— BERT tokenizer
bertTokenizer
object
BERT tokenizer, specified as a bertTokenizer object.
If you specify the tokenizer
argument, then you must not specify the Model
argument.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Example: bertDocumentClassifier(Model="tiny")
creates a BERT-Tiny document classifier.
Model
— BERT model
"base"
(default) | "tiny"
| "mini"
| "small"
| "large"
| "multilingual"
BERT model, specified as one of these options:
"base"
— BERT-Base model. This option requires theText Analytics Toolbox™ Model for BERT-Base Network support package. This model has 108.8 million learnable parameters."tiny"
— BERT-Tiny model. This option requires theText Analytics Toolbox Model for BERT-Tiny Network support package. This model has 4.3 million learnable parameters."mini"
— BERT-Mini model. This option requires theText Analytics Toolbox Model for BERT-Mini Network support package. This model has 11.1 million learnable parameters."small"
— BERT-Small model. This option requires theText Analytics Toolbox Model for BERT-Small Network support package. This model has 28.5 million learnable parameters."large"
— BERT-Large model. This option requires theText Analytics Toolbox Model for BERT-Large Network support package. This model has 334 million learnable parameters."multilingual"
— BERT-Base multilingual model. This option requires the Text Analytics Toolbox Model for BERT-Base Multilingual Cased Network support package. This model has 177.2 million learnable parameters.
If you specify the Model
argument, then you must not specify the net
and tokenizer
arguments.
Tip
To customize the BERT neural network architecture, modify the dlnetwork (Deep Learning Toolbox) object output of thebert function and use the net
andtokenizer
arguments.
DropoutProbability
— Probability of dropping out input elements in dropout layers
0.1
(default) | scalar in the range [0, 1)
Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).
When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p
, where X
is the layer input and p
is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p)
.
This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
AttentionDropoutProbability
— Probability of dropping out input elements in attention layers
0.1
(default) | scalar in the range [0, 1)
Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).
When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p
, where scores
is the layer input and p
is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p)
.
This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Properties
Network
— Pretrained BERT model
dlnetwork
object
This property is read-only.
Pretrained BERT model, specified as a dlnetwork (Deep Learning Toolbox) object corresponding to thenet
or Model
argument.
Tokenizer
— BERT tokenizer
bertTokenizer
object
This property is read-only.
BERT tokenizer, specified as a bertTokenizer object corresponding to the tokenizer
orModel
argument.
ClassNames
— Class names
["positive" "negative"]
(default) | categorical vector | string array | cell array of character vectors
Class names, specified as a categorical vector, a string array, or a cell array of character vectors.
If you specify the net
argument, then the output size of the network must match the number of classes.
To set this property, use the corresponding name-value argument when you create thebertDocumentClassifier
object. After you create a bertDocumentClassifier
object, this property is read-only.
Data Types: string
| cell
| categorical
Object Functions
classify | Classify document using BERT document classifier |
---|
Examples
Create BERT Document Classifier for Training
Create a BERT document classifier that is ready for training.
mdl = bertDocumentClassifier
mdl = bertDocumentClassifier with properties:
Network: [1x1 dlnetwork]
Tokenizer: [1x1 bertTokenizer]
ClassNames: ["positive" "negative"]
View the class names.
ans = 1x2 string "positive" "negative"
Specify BERT Document Classifier Classes
Create a BERT document classifier for the classes "Electrical Failure"
, "Leak"
, "Mechanical Failure"
, and "Software Failure"
.
classNames = ["Electrical Failure" "Leak" "Mechanical Failure" "Software Failure"]; mdl = bertDocumentClassifier(ClassNames=classNames)
mdl = bertDocumentClassifier with properties:
Network: [1x1 dlnetwork]
Tokenizer: [1x1 bertTokenizer]
ClassNames: ["Electrical Failure" "Leak" "Mechanical Failure" "Software Failure"]
View the class names.
ans = 1x4 string "Electrical Failure" "Leak" "Mechanical Failure" "Software Failure"
References
[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.
[2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58
[3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks."Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.
Version History
Introduced in R2023b