pattern - Patterns to search and match text - MATLAB (original) (raw)
Patterns to search and match text
Since R2020b
Description
A pattern defines rules for matching text with text-searching functions like contains
, matches
, andextract
. You can build a pattern expression using pattern functions, operators, and literal text. For example, MATLAB® release names, start with "R"
, followed by the four-digit year, and then either "a"
or "b"
. Define a pattern to match the format of the release names:
pat = "R" + digitsPattern(4) + ("a"|"b");
Match that pattern in a string:
str = ["String was introduced in R2016b." "Pattern was added in R2020b."]; extract(str,pat)
ans = 2x1 string array "R2016b" "R2020b"
Creation
Patterns are composed of literal text and other patterns using the +
,|
, and ~
operators. You also can create common patterns using Object Functions, which use rules often associated with regular expressions:
- Character-Matching Patterns – Ranges of letters or digits, wildcards, or whitespaces, such as lettersPattern.
- Search Rules – How many times the pattern must occur, case sensitivity, optional patterns, and named expressions, such as asManyOfPattern and optionalPattern.
- Boundaries – Boundaries at the start or end of a run of specific characters, such as alphanumericBoundary. Boundary patterns can be negated using the
~
operator so that matches to the boundary prevents matching of their pattern expression. - Pattern Organization – Define pattern structure and specify how pattern expressions are displayed, such as maskedPattern and namedPattern.
The function pattern
also creates pattern functions with the syntax,pat = pattern(txt)
, where txt
is literal text thatpat
matches. Pattern functions are useful for specifying pattern type for function argument validation. However, the pattern
function is rarely needed for other cases because MATLAB text-matching functions accept text inputs.
Object Functions
Search Text
contains | Determine if pattern is in strings |
---|---|
matches | Determine if pattern matches strings |
count | Count occurrences of pattern in strings |
endsWith | Determine if strings end with pattern |
startsWith | Determine if strings start with pattern |
Edit Text
Character-Matching Patterns
Search Rule Patterns
Boundary Patterns
Regular Expression Patterns
Pattern Organization
Examples
Search Text Using Patterns
lettersPattern
is a typical character-matching pattern that matches letter characters. Create a pattern that matches one or more letter characters.
txt = ["This" "is a" "1x6" "string" "array" "."]; pat = lettersPattern;
Use contains
to determine if characters matched by pat
are present in each string. The output logical array shows that the first five of the strings in txt
contain letters, but the sixth string does not.
ans = 1x6 logical array
1 1 1 1 1 0
Determine if text starts with the specified pattern. The output logical array shows that four of the strings in txt
start with letters, but two strings do not.
ans = 1x6 logical array
1 1 0 1 1 0
Determine if the string fully matches the specified pattern. The output logical array shows which of the strings in txt
contain nothing but letters.
ans = 1x6 logical array
1 0 0 1 1 0
Count the number of times a pattern matched. The output numerical array shows how many times lettersPattern
matched in each element of txt
. Note that lettersPattern
matches one or more letters so a group of concurrent letters is a single match.
Edit Text Using Patterns
digitsPattern
is a typical character-matching pattern that matches digit characters. Create a pattern that matches digit characters.
txt = ["1 fish" "2 fish" "[1,0,0] fish" "[0,0,1] fish"]; pat = digitsPattern;
Use replace
to edit pieces of text that match the pattern.
ans = 1x4 string "# fish" "# fish" "[#,#,#] fish" "[#,#,#] fish"
Create a new piece of text by inserting an "!"
character after matched letters.
ans = 1x4 string "1! fish" "2! fish" "[1!,0!,0!] fish" "[0!,0!,1!] fish"
Patterns can be created using the OR operator, |
, with text. Erase text matched by the specified pattern.
txt = erase(txt,"," | "]" | "[")
txt = 1x4 string "1 fish" "2 fish" "100 fish" "001 fish"
Extract pat
from the new text.
ans = 1x4 string "1" "2" "100" "001"
Count Characters in Text
Use patterns to count the occurrences of individual characters in a piece of text.
txt = "She sells sea shells by the sea shore.";
Create pat
as a pattern
object that matches individual letters using alphanumericsPattern
. Extract the pattern.
pat = alphanumericsPattern(1); letters = extract(txt,pat);
Display a histogram of the number of occurrences of each letter.
letters = lower(letters); letters = categorical(letters); histogram(letters)
Hide Details When Displaying Complicated Patterns
Use maskedPattern
to display a variable in place of a complicated pattern expression.
Build a pattern that matches simple arithmetic expressions composed of numbers and arithmetic operators.
mathSymbols = asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
mathSymbols = pattern Matching:
asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
Build a pattern that matches arithmetic expressions with whitespaces between characters using mathSymbols
.
longExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
longExpressionPat = pattern Matching:
asManyOfPattern(asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1) + whitespacePattern) + asManyOfPattern(digitsPattern | characterListPattern("+-*/="),1)
The displayed pattern expression is long and difficult to read. Use maskedPattern
to display the variable name, mathSymbols
, in place of the pattern expression.
mathSymbols = maskedPattern(mathSymbols); shortExpressionPat = asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
shortExpressionPat = pattern Matching:
asManyOfPattern(mathSymbols + whitespacePattern) + mathSymbols
Use details to show more information
Create a string containing some arithmetic expressions, and then extract the pattern from the text.
txt = "What is the answer to 1 + 1? Oh, I know! 1 + 1 = 2!"; arithmetic = extract(txt,shortExpressionPat)
arithmetic = 2x1 string "1 + 1" "1 + 1 = 2"
Specify Names and Descriptions for Complicated Patterns
Create a pattern from two named patterns. Naming patterns adds context to the display of the pattern.
Build two patterns: one that matches words that begin and end with the letter D, and one that matches words that begin and end with the letter R.
dWordsPat = letterBoundary + caseInsensitivePattern("d" + lettersPattern + "d") + letterBoundary; rWordsPat = letterBoundary + caseInsensitivePattern("r" + lettersPattern + "r") + letterBoundary;
Build a pattern using the named patterns that finds a word that starts and ends with D followed by a word that starts and ends with R.
dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat = pattern Matching:
letterBoundary + caseInsensitivePattern("d" + lettersPattern + "d") + letterBoundary + whitespacePattern + letterBoundary + caseInsensitivePattern("r" + lettersPattern + "r") + letterBoundary
This pattern is hard to read and does not convey much information about its purpose. Use namedPattern
to designate the patterns as named patterns that display specified names and descriptions in place of the pattern expressions.
dWordsPat = namedPattern(dWordsPat,"dWords", "Words that start and end with D"); rWordsPat = namedPattern(rWordsPat,"rWords", "Words that start and end with R"); dAndRWordsPat = dWordsPat + whitespacePattern + rWordsPat
dAndRWordsPat = pattern Matching:
dWords + whitespacePattern + rWords
Using named patterns:
dWords: Words that start and end with D
rWords: Words that start and end with R
Use details to show more information
Create a string and extract the text that matches the pattern.
txt = "Dad, look at the divided river!"; words = extract(txt,dAndRWordsPat)
Match Email Addresses
Build an easy to read pattern to match email addresses.
Email addresses follow the structure username@domain.TLD, where username and domain are made up of identifiers separated by periods. Build a pattern that matches identifiers composed of any combination of alphanumeric characters and "_"
characters. Use maskedPattern
to name this pattern identifier
.
identifier = asManyOfPattern(alphanumericsPattern(1) | "_", 1); identifier = maskedPattern(identifier);
Build patterns to match domains and subdomains comprised of identifiers. Create a pattern that matches TLDs from a specified list.
subdomain = asManyOfPattern(identifier + ".") + identifier; domainName = namedPattern(identifier,"domainName"); tld = "com" | "org" | "gov" | "net" | "edu";
Build a pattern for matching the local part of an email, which matches one or more identifiers separated by periods. Build a pattern for matching the domain, TLD, and any potential subdomains by combining the previously defined patterns. Use namedPattern
to assign each of these patterns to a named pattern.
username = asManyOfPattern(identifier + ".") + identifier; domain = optionalPattern(namedPattern(subdomain) + ".") + ... domainName + "." + ... namedPattern(tld);
Combine all of the patterns into a single pattern expression. Use namedPattern
to assign username
, domain
, and emailPattern
to named patterns.
emailAddress = namedPattern(username) + "@" + namedPattern(domain); emailPattern = namedPattern(emailAddress)
emailPattern = pattern Matching emailAddress:
username + "@" + domain
Using named patterns:
emailAddress : username + "@" + domain
username : asManyOfPattern(identifier + ".") + identifier
domain : optionalPattern(subdomain + ".") + domainName + "." + tld
subdomain : asManyOfPattern(identifier + ".") + identifier
domainName: identifier
tld : "com" | "org" | "gov" | "net" | "edu"
Use details to show more information
Create a string that contains an email address, and then extract the pattern from the text.
txt = "You can reach me by email at John.Smith@department.organization.org"; extract(txt,emailPattern)
ans = "John.Smith@department.organization.org"
Named patterns allow dot-indexing in order to access named subpatterns. Use dot-indexing to assign a specific value to the named pattern domain
.
emailPattern.emailAddress.domain = "mathworks.com"
emailPattern = pattern Matching emailAddress:
username + "@" + domain
Using named patterns:
emailAddress: username + "@" + domain
username : asManyOfPattern(identifier + ".") + identifier
domain : "mathworks.com"
Use details to show more information
Extended Capabilities
Thread-Based Environment
Run code in the background using MATLAB® backgroundPool
or accelerate code with Parallel Computing Toolbox™ ThreadPool
.
Version History
Introduced in R2020b