addTypeDetails - Add token type details to documents - MATLAB (original) (raw)
Main Content
Add token type details to documents
Syntax
Description
[updatedDocuments](#d126e4788) = addTypeDetails([documents](#d126e4672))
detects the token types in documents
and updates the token details. The function adds type details to the tokens with unknown type only. To get the token types fromupdatedDocuments
, use tokenDetails.
[updatedDocuments](#d126e4788) = addTypeDetails([documents](#d126e4672),[Name,Value](#namevaluepairarguments))
specifies additional options using one or more name-value pairs.
Tip
Use addTypeDetails
before using the lower
,upper
, and erasePunctuation
functions asaddTypeDetails
uses information that is removed by these functions.
Examples
Convert manually tokenized text into a tokenizedDocument
object, setting the 'TokenizeMethod'
option to 'none'
.
str = ["For" "more" "information" "," "see" "https://www.mathworks.com" "."]; documents = tokenizedDocument(str,'TokenizeMethod','none')
documents = tokenizedDocument:
7 tokens: For more information , see https://www.mathworks.com .
View the token details using the tokenDetails
function.
tdetails = tokenDetails(documents)
tdetails=7×2 table Token DocumentNumber ___________________________ ______________
"For" 1
"more" 1
"information" 1
"," 1
"see" 1
"https://www.mathworks.com" 1
"." 1
If you set 'TokenizeMethod'
to 'none'
in the call to the tokenizedDocument
function, then it does not detect the types of the tokens. To add the token type details, use the addTypeDetails
function.
documents = addTypeDetails(documents);
View the updated token details.
tdetails = tokenDetails(documents)
tdetails=7×3 table
Token DocumentNumber Type
___________________________ ______________ ___________
"For" 1 letters
"more" 1 letters
"information" 1 letters
"," 1 punctuation
"see" 1 letters
"https://www.mathworks.com" 1 web-address
"." 1 punctuation
Input Arguments
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: 'TopLevelDomains',["com" "net" "org"]
specifies the top-level domains "com", "net", and "org" for web address detection.
Top-level domains to use for web address detection, specified as a character vector, string array, or cell array of character vectors.
If you do not specify TopLevelDomains
, then the function uses the output of the topLevelDomains function.
Example: ["com" "net" "org"]
Data Types: char
| string
| cell
Option to discard previously computed details and recompute them, specified astrue
or false
.
Data Types: logical
Output Arguments
Version History
Introduced in R2018b