docfun - Apply function to words in documents - MATLAB (original) (raw)
Apply function to words in documents
Syntax
Description
[newDocuments](#d126e18068) = docfun([func](#d126e17989),[documents](#d126e18048))
calls the function specified by the function handle func
and passes elements of documents
as a string vector of words.
- If
func
accepts exactly one input argument, then the words ofnewDocuments(i)
are the output offunc(string(documents(i)))
. - If
func
accepts two input arguments, then the words ofnewDocuments(i)
are the output offunc(string(documents(i)),details)
, wheredetails
contains the corresponding token details output by tokenDetails. - If
func
changes the number of words in the document, thendocfun
removes the token details from that document.
docfun
does not perform the calls to functionfunc
in a specific order.
[newDocuments](#d126e18068) = docfun([func](#d126e17989),documents1,...,documentsN)
calls the function specified by the function handle func
and passes elements of documents1,…,documentsN
as string vectors of words, where N is the number of inputs to the functionfunc
. The words of newDocuments(i)
are the output offunc(string(documents1(i)),...,string(documentsN(i)))
.
Each of documents1,…,documentsN
must be the same size.
Examples
Apply reverse
to each word in a document array.
documents = tokenizedDocument([ ... "an example of a short sentence" "a second short sentence"])
documents = 2×1 tokenizedDocument:
6 tokens: an example of a short sentence
4 tokens: a second short sentence
func = @reverse; newDocuments = docfun(func,documents)
newDocuments = 2×1 tokenizedDocument:
6 tokens: na elpmaxe fo a trohs ecnetnes
4 tokens: a dnoces trohs ecnetnes
Tag words by combining the words from one document array with another, using the string function plus
.
Create the first tokenizedDocument
array. Erase the punctuation and convert the text to lowercase.
str = [ ... "An example of a short sentence." "A second short sentence."]; str = erasePunctuation(str); str = lower(str); documents1 = tokenizedDocument(str)
documents1 = 2×1 tokenizedDocument:
6 tokens: an example of a short sentence
4 tokens: a second short sentence
Create the second tokenizedDocument
array. The documents have the same number of words as the corresponding documents in documents1
. The words of documents2
are POS tags for the corresponding words.
documents2 = tokenizedDocument([ ... "_det _noun _prep _det _adj _noun" "_det _adj _adj _noun"])
documents2 = 2×1 tokenizedDocument:
6 tokens: _det _noun _prep _det _adj _noun
4 tokens: _det _adj _adj _noun
func = @plus; newDocuments = docfun(func,documents1,documents2)
newDocuments = 2×1 tokenizedDocument:
6 tokens: an_det example_noun of_prep a_det short_adj sentence_noun
4 tokens: a_det second_adj short_adj sentence_noun
The output is not the same as calling plus
on the documents directly.
plus(documents1,documents2)
ans = 2×1 tokenizedDocument:
12 tokens: an example of a short sentence _det _noun _prep _det _adj _noun
8 tokens: a second short sentence _det _adj _adj _noun
Input Arguments
Function handle that accepts N string arrays as inputs and outputs a string array. func
must acceptstring(documents1(i)),...,string(documentsN(i))
as input.
Function handle to apply to words in documents. The function must have one of the following syntaxes:
newWords = func(words)
, wherewords
is a string array of the words of a single document.newWords = func(words,details)
, wherewords
is a string array of the words of a single document, anddetails
is the corresponding table of token details given by tokenDetails.newWords = func(words1,...,wordsN)
, wherewords1,...,wordsN
are string arrays of words.
Example: @reverse
Data Types: function_handle
Output Arguments
Version History
Introduced in R2017b