containsNgrams - Check if n-gram is member of documents - MATLAB (original) (raw)
Main Content
Check if n-gram is member of documents
Since R2022a
Syntax
Description
`tf` = containsNgrams([documents](#d126e11629),[ngrams](#mw%5Fefdddbea-8427-4daf-abc7-0156179fc594))
returns 1
where any n-gram of documents
matchesngrams
and returns 0
otherwise.
tf = containsNgrams([documents](#d126e11629),[ngrams](#mw%5Fefdddbea-8427-4daf-abc7-0156179fc594),IgnoreCase=[flag](#mw%5Fb6b0c2dc-48df-43a5-b4ed-b5bd0a7f7ab5%5Fsep%5Fmw%5Fb8f02aef-cfb4-49bd-8d6b-95737f85fa97))
also specifies whether to ignore letter case when checking n-grams.
Examples
Create an array of tokenized documents.
documents = tokenizedDocument([ "an example of a short sentence" "a second short sentence"]);
Check for documents containing the n-gram ["a" "short"]
.
tf = containsNgrams(documents,["a" "short"])
tf = 2×1 logical array
1 0
Input Arguments
N-grams to check, specified as one of these values:
- String array
- Character vector
- Cell array of character vectors
- pattern array
If ngrams
is a string array, cell array, orpattern
array, then it has sizenumNgrams
-by-maxN
, wherenumNgrams
is the number of n-grams and maxN
is the length of the largest n-gram. If ngrams
is a character vector, then it represents a single word (unigram).
The value of ngrams(i,j)
corresponds to thej
th word of the i
th n-gram. If the number of words in the i
th n-gram is less than maxN
, then the remaining entries of the i
th row of ngrams
must be empty.
If ngrams
contains multiple n-grams or patterns, then the function returns 1
where any of the n-grams appear in the corresponding document.
Example: ["An" ""; "An example"; "example" ""]
Data Types: string
| char
| cell
Option to ignore case, specified as one of these values:
0
(false
) – Treat candidate matches that differ only by letter case as nonmatching.1
(true
) – Treat candidate matches that differ only by letter case as matching.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| logical
Version History
Introduced in R2022a