docfun - Apply function to words in documents - MATLAB (original) (raw)

Apply function to words in documents

Syntax

Description

[newDocuments](#d126e18068) = docfun([func](#d126e17989),[documents](#d126e18048)) calls the function specified by the function handle func and passes elements of documents as a string vector of words.

docfun does not perform the calls to functionfunc in a specific order.

example

[newDocuments](#d126e18068) = docfun([func](#d126e17989),documents1,...,documentsN) calls the function specified by the function handle func and passes elements of documents1,…,documentsN as string vectors of words, where N is the number of inputs to the functionfunc. The words of newDocuments(i) are the output offunc(string(documents1(i)),...,string(documentsN(i))).

Each of documents1,…,documentsN must be the same size.

example

Examples

collapse all

Apply reverse to each word in a document array.

documents = tokenizedDocument([ ... "an example of a short sentence" "a second short sentence"])

documents = 2×1 tokenizedDocument:

6 tokens: an example of a short sentence
4 tokens: a second short sentence

func = @reverse; newDocuments = docfun(func,documents)

newDocuments = 2×1 tokenizedDocument:

6 tokens: na elpmaxe fo a trohs ecnetnes
4 tokens: a dnoces trohs ecnetnes

Tag words by combining the words from one document array with another, using the string function plus.

Create the first tokenizedDocument array. Erase the punctuation and convert the text to lowercase.

str = [ ... "An example of a short sentence." "A second short sentence."]; str = erasePunctuation(str); str = lower(str); documents1 = tokenizedDocument(str)

documents1 = 2×1 tokenizedDocument:

6 tokens: an example of a short sentence
4 tokens: a second short sentence

Create the second tokenizedDocument array. The documents have the same number of words as the corresponding documents in documents1. The words of documents2 are POS tags for the corresponding words.

documents2 = tokenizedDocument([ ... "_det _noun _prep _det _adj _noun" "_det _adj _adj _noun"])

documents2 = 2×1 tokenizedDocument:

6 tokens: _det _noun _prep _det _adj _noun
4 tokens: _det _adj _adj _noun

func = @plus; newDocuments = docfun(func,documents1,documents2)

newDocuments = 2×1 tokenizedDocument:

6 tokens: an_det example_noun of_prep a_det short_adj sentence_noun
4 tokens: a_det second_adj short_adj sentence_noun

The output is not the same as calling plus on the documents directly.

plus(documents1,documents2)

ans = 2×1 tokenizedDocument:

12 tokens: an example of a short sentence _det _noun _prep _det _adj _noun
 8 tokens: a second short sentence _det _adj _adj _noun

Input Arguments

collapse all

Function handle that accepts N string arrays as inputs and outputs a string array. func must acceptstring(documents1(i)),...,string(documentsN(i)) as input.

Function handle to apply to words in documents. The function must have one of the following syntaxes:

Example: @reverse

Data Types: function_handle

Output Arguments

Version History

Introduced in R2017b