addTypeDetails - Add token type details to documents - MATLAB (original) (raw)

Main Content

Add token type details to documents

Syntax

Description

[updatedDocuments](#d126e4788) = addTypeDetails([documents](#d126e4672)) detects the token types in documents and updates the token details. The function adds type details to the tokens with unknown type only. To get the token types fromupdatedDocuments, use tokenDetails.

example

[updatedDocuments](#d126e4788) = addTypeDetails([documents](#d126e4672),[Name,Value](#namevaluepairarguments)) specifies additional options using one or more name-value pairs.

Tip

Use addTypeDetails before using the lower,upper, and erasePunctuation functions asaddTypeDetails uses information that is removed by these functions.

example

Examples

collapse all

Convert manually tokenized text into a tokenizedDocument object, setting the 'TokenizeMethod' option to 'none'.

str = ["For" "more" "information" "," "see" "https://www.mathworks.com" "."]; documents = tokenizedDocument(str,'TokenizeMethod','none')

documents = tokenizedDocument:

7 tokens: For more information , see https://www.mathworks.com .

View the token details using the tokenDetails function.

tdetails = tokenDetails(documents)

tdetails=7×2 table Token DocumentNumber ___________________________ ______________

"For"                                1       
"more"                               1       
"information"                        1       
","                                  1       
"see"                                1       
"https://www.mathworks.com"          1       
"."                                  1       

If you set 'TokenizeMethod' to 'none' in the call to the tokenizedDocument function, then it does not detect the types of the tokens. To add the token type details, use the addTypeDetails function.

documents = addTypeDetails(documents);

View the updated token details.

tdetails = tokenDetails(documents)

tdetails=7×3 table Token DocumentNumber Type
___________________________ ______________ ___________

"For"                                1           letters    
"more"                               1           letters    
"information"                        1           letters    
","                                  1           punctuation
"see"                                1           letters    
"https://www.mathworks.com"          1           web-address
"."                                  1           punctuation

Input Arguments

Name-Value Arguments

collapse all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'TopLevelDomains',["com" "net" "org"] specifies the top-level domains "com", "net", and "org" for web address detection.

Top-level domains to use for web address detection, specified as a character vector, string array, or cell array of character vectors.

If you do not specify TopLevelDomains, then the function uses the output of the topLevelDomains function.

Example: ["com" "net" "org"]

Data Types: char | string | cell

Option to discard previously computed details and recompute them, specified astrue or false.

Data Types: logical

Output Arguments

Version History

Introduced in R2018b