bertDocumentClassifier - BERT document classifier - MATLAB (original) (raw)

BERT document classifier

Since R2023b

Description

A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.

Creation

Description

`mdl` = bertDocumentClassifier creates abertDocumentClassifier object.

example

`mdl` = bertDocumentClassifier([net](#mw%5Ff27a1d0f-37a1-4dec-b3ff-ba5227992104),[tokenizer](#mw%5Fb4298fe2-ab25-4454-9618-ad2aff148bce)) creates a bertDocumentClassifier object from the specified BERT neural network and tokenizer.

`mdl` = bertDocumentClassifier(___,[Name=Value](#namevaluepairarguments)) sets the ClassNames property and additional options using one or more name-value arguments.

example

Input Arguments

expand all

net — BERT neural network

dlnetwork object

BERT neural network, specified as a dlnetwork (Deep Learning Toolbox) object.

If you specify the net argument, then you must not specify the Model argument. The network must have three sequence input layers with input sizes of one. The output size of the network must match the number of classes in the ClassNames property. The inputs innet.InputNames(1), net.InputNames(2), andnet.InputNames(3) must be the inputs for the input data, the attention mask, and the segments, respectively.

tokenizer — BERT tokenizer

bertTokenizer object

BERT tokenizer, specified as a bertTokenizer object.

If you specify the tokenizer argument, then you must not specify the Model argument.

Name-Value Arguments

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: bertDocumentClassifier(Model="tiny") creates a BERT-Tiny document classifier.

Model — BERT model

"base" (default) | "tiny" | "mini" | "small" | "large" | "multilingual"

BERT model, specified as one of these options:

If you specify the Model argument, then you must not specify the net and tokenizer arguments.

Tip

To customize the BERT neural network architecture, modify the dlnetwork (Deep Learning Toolbox) object output of thebert function and use the net andtokenizer arguments.

DropoutProbability — Probability of dropping out input elements in dropout layers

0.1 (default) | scalar in the range [0, 1)

Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).

When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p, where X is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

AttentionDropoutProbability — Probability of dropping out input elements in attention layers

0.1 (default) | scalar in the range [0, 1)

Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).

When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p, where scores is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

Properties

expand all

Network — Pretrained BERT model

dlnetwork object

This property is read-only.

Pretrained BERT model, specified as a dlnetwork (Deep Learning Toolbox) object corresponding to thenet or Model argument.

Tokenizer — BERT tokenizer

bertTokenizer object

This property is read-only.

BERT tokenizer, specified as a bertTokenizer object corresponding to the tokenizer orModel argument.

ClassNames — Class names

["positive" "negative"] (default) | categorical vector | string array | cell array of character vectors

Class names, specified as a categorical vector, a string array, or a cell array of character vectors.

If you specify the net argument, then the output size of the network must match the number of classes.

To set this property, use the corresponding name-value argument when you create thebertDocumentClassifier object. After you create a bertDocumentClassifier object, this property is read-only.

Data Types: string | cell | categorical

Object Functions

classify Classify document using BERT document classifier

Examples

collapse all

Create BERT Document Classifier for Training

Create a BERT document classifier that is ready for training.

mdl = bertDocumentClassifier

mdl = bertDocumentClassifier with properties:

   Network: [1x1 dlnetwork]
 Tokenizer: [1x1 bertTokenizer]
ClassNames: ["positive"    "negative"]

View the class names.

ans = 1x2 string "positive" "negative"

Specify BERT Document Classifier Classes

Create a BERT document classifier for the classes "Electrical Failure", "Leak", "Mechanical Failure", and "Software Failure".

classNames = ["Electrical Failure" "Leak" "Mechanical Failure" "Software Failure"]; mdl = bertDocumentClassifier(ClassNames=classNames)

mdl = bertDocumentClassifier with properties:

   Network: [1x1 dlnetwork]
 Tokenizer: [1x1 bertTokenizer]
ClassNames: ["Electrical Failure"    "Leak"    "Mechanical Failure"    "Software Failure"]

View the class names.

ans = 1x4 string "Electrical Failure" "Leak" "Mechanical Failure" "Software Failure"

References

[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.

[2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58

[3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks."Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.

Version History

Introduced in R2023b