documentEmbedding - Document embedding model to map documents to vectors - MATLAB (original) (raw)
Main Content
Document embedding model to map documents to vectors
Since R2024a
Description
A document embedding maps documents to real vectors.
The vectors attempt to capture the semantic content of the full document, so similar documents have similar vectors. The document can be a sentence, a paragraph, or a longer text.
Creation
Create a document embedding from a pretrained embedding usingdocumentEmbedding
.
Syntax
Description
`emb` = documentEmbedding
returns a document embedding using the all-MiniLM-L6-v2 sentence transformers model.
This function requires Deep Learning Toolbox™.
`emb` = documentEmbedding(Model=[modelName](#mw%5F6f51c3f4-1341-4c4d-a4ba-ab9ffa5d756d))
returns the document embedding model specified by the Model
name-value argument.
Input Arguments
modelName
— Document embedding model
"all-MiniLM-L6-v2"
(default) | "all-MiniLM-L12-v2"
Model name, specified as one of these values:
"all-MiniLM-L6-v2"
— Sentence transformer model with six self-attention layers. This model outputs a 1-by-384 embedding vector. This option requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package."all-MiniLM-L12-v2"
— Sentence transformer model with twelve self-attention layers. This model outputs a 1-by-384 embedding vector. This option requires the Text Analytics Toolbox Model for all-MiniLM-L12-v2 Network support package.
If the required support package is not installed, then the function provides a download link.
Object Functions
embed | Map document to embedding vector |
---|
Examples
Map Documents to Vectors
Load the pretrained document embedding all-MiniLM-L6-v2 using the documentEmbedding
function. This model requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package. If this support package is not installed, then the function provides a download link.
Create an array of input documents.
documents = [ "the quick brown fox jumped over the lazy dog" "the fast brown fox jumped over the lazy dog" "the lazy dog sat there and did nothing"];
Map the input documents to vectors using the embed
function.
embeddedDocuments = embed(emb,documents);
To estimate how similar the documents are, compute the pairwise cosine similarities using cosineSimilarity
.
similarities = cosineSimilarity(embeddedDocuments)
similarities = 3×3
1.0000 0.9840 0.5505
0.9840 1.0000 0.5524
0.5505 0.5524 1.0000
Version History
Introduced in R2024a