containsNgrams - Check if n-gram is member of documents - MATLAB (original) (raw)

Main Content

Check if n-gram is member of documents

Since R2022a

Syntax

Description

`tf` = containsNgrams([documents](#d126e11629),[ngrams](#mw%5Fefdddbea-8427-4daf-abc7-0156179fc594)) returns 1 where any n-gram of documents matchesngrams and returns 0 otherwise.

example

tf = containsNgrams([documents](#d126e11629),[ngrams](#mw%5Fefdddbea-8427-4daf-abc7-0156179fc594),IgnoreCase=[flag](#mw%5Fb6b0c2dc-48df-43a5-b4ed-b5bd0a7f7ab5%5Fsep%5Fmw%5Fb8f02aef-cfb4-49bd-8d6b-95737f85fa97)) also specifies whether to ignore letter case when checking n-grams.

Examples

collapse all

Create an array of tokenized documents.

documents = tokenizedDocument([ "an example of a short sentence" "a second short sentence"]);

Check for documents containing the n-gram ["a" "short"].

tf = containsNgrams(documents,["a" "short"])

tf = 2×1 logical array

1 0

Input Arguments

collapse all

N-grams to check, specified as one of these values:

If ngrams is a string array, cell array, orpattern array, then it has sizenumNgrams-by-maxN, wherenumNgrams is the number of n-grams and maxN is the length of the largest n-gram. If ngrams is a character vector, then it represents a single word (unigram).

The value of ngrams(i,j) corresponds to thejth word of the ith n-gram. If the number of words in the ith n-gram is less than maxN, then the remaining entries of the ith row of ngrams must be empty.

If ngrams contains multiple n-grams or patterns, then the function returns 1 where any of the n-grams appear in the corresponding document.

Example: ["An" ""; "An example"; "example" ""]

Data Types: string | char | cell

Option to ignore case, specified as one of these values:

Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

Version History

Introduced in R2022a