documentEmbedding - Document embedding model to map documents to vectors - MATLAB (original) (raw)

Main Content

Document embedding model to map documents to vectors

Since R2024a

Description

A document embedding maps documents to real vectors.

The vectors attempt to capture the semantic content of the full document, so similar documents have similar vectors. The document can be a sentence, a paragraph, or a longer text.

Creation

Create a document embedding from a pretrained embedding usingdocumentEmbedding.

Syntax

Description

`emb` = documentEmbedding returns a document embedding using the all-MiniLM-L6-v2 sentence transformers model.

This function requires Deep Learning Toolbox™.

example

`emb` = documentEmbedding(Model=[modelName](#mw%5F6f51c3f4-1341-4c4d-a4ba-ab9ffa5d756d)) returns the document embedding model specified by the Model name-value argument.

Input Arguments

expand all

modelName — Document embedding model

"all-MiniLM-L6-v2" (default) | "all-MiniLM-L12-v2"

Model name, specified as one of these values:

If the required support package is not installed, then the function provides a download link.

Object Functions

embed Map document to embedding vector

Examples

collapse all

Map Documents to Vectors

Load the pretrained document embedding all-MiniLM-L6-v2 using the documentEmbedding function. This model requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package. If this support package is not installed, then the function provides a download link.

Create an array of input documents.

documents = [ "the quick brown fox jumped over the lazy dog" "the fast brown fox jumped over the lazy dog" "the lazy dog sat there and did nothing"];

Map the input documents to vectors using the embed function.

embeddedDocuments = embed(emb,documents);

To estimate how similar the documents are, compute the pairwise cosine similarities using cosineSimilarity.

similarities = cosineSimilarity(embeddedDocuments)

similarities = 3×3

1.0000    0.9840    0.5505
0.9840    1.0000    0.5524
0.5505    0.5524    1.0000

Version History

Introduced in R2024a