regexpPattern - Pattern that matches specified regular expression - MATLAB (original) (raw)
Pattern that matches specified regular expression
Since R2020b
Syntax
Description
[pat](#mw%5Fc14a2abf-9679-4c96-ade0-df447711fedc%5Fsep%5Fmw%5F5d8462f7-c5cb-4493-9ed2-15c426e9be6f) = regexpPattern([expression](#mw%5Fc14a2abf-9679-4c96-ade0-df447711fedc%5Fsep%5Fshared-expression))
creates a pattern that matches the regular expression.
[pat](#mw%5Fc14a2abf-9679-4c96-ade0-df447711fedc%5Fsep%5Fmw%5F5d8462f7-c5cb-4493-9ed2-15c426e9be6f) = regexpPattern([expression](#mw%5Fc14a2abf-9679-4c96-ade0-df447711fedc%5Fsep%5Fshared-expression),[Name,Value](#namevaluepairarguments))
specifies additional options with one or more name-value pair arguments. For example, you can specify 'IgnoreCase'
as true
to ignore case when matching..
Examples
Use regexpPattern
to specify patterns using regular expressions that can be used as inputs for text-searching functions.
Find words that start with c
, end with t
, and contain one or more vowels in between.
txt = "bat cat can car coat court CUT ct CAT-scan"; expression = 'c[aeiou]+t';
The regular expression 'c[aeiou]+t'
specifies this pattern:
c
must be the first character.c
must be followed by one of the characters inside the brackets,[aeiou]
.- The bracketed pattern must occur one or more times, as indicated by the
+
operator. t
must be the last character, with no characters between the bracketed pattern and thet
.
Extract the pattern. Note, the words CUT
and CAT
do not match because they are uppercase.
pat = regexpPattern(expression); extract(txt,pat)
ans = 2×1 string "cat" "coat"
Patterns created using regexpPattern
can be combined with other pattern functions to create more complicated patterns. Use whitespacePattern
and lettersPattern
to create a new pattern that also matches words after the regular expression matches, and then extract the new pattern.
pat = regexpPattern(expression) + whitespacePattern + lettersPattern; extract(txt,pat)
ans = 2×1 string "cat can" "coat court"
Create a string containing a newline
character. Use the regular expression '.'
to match any character except newline
characters.
txt = "First Line" + newline + "Second Line"
txt = "First Line Second Line"
The regular expression '.+'
matches one or more of any character including newline
characters. Count how many times the pattern matches.
pat = regexpPattern(expression); count(txt,pat)
Create a new regular expression pattern, but this time specify DotExceptNewline
as true
so that the pattern does not match newline
characters. Count how many times the pattern matches.
pat = regexpPattern(expression,"DotExceptNewline",true); count(txt,pat)
Create txt
as a string.
The expression '. *'
only matches individual characters because of the whitespace between .
and *
. Create a pattern to match the regular expression '. *'
, and then extract the pattern.
expression = '. *'; pat = regexpPattern(expression); extract(txt,pat)
ans = 10×1 string "H" "e" "l" "l" "o " "W" "o" "r" "l" "d"
Create a new regular expression pattern, but this time specify FreeSpacing
as true
to ignore whitespaces in the regular expression. Extract the new pattern.
pat = regexpPattern(expression,"FreeSpacing",true); extract(txt,pat)
Find words that start with c
, end with t
, and contain one or more vowels in between, regardless of case.
txt = "bat cat can car coat court CUT ct CAT-scan"; expression = 'c[aeiou]+t';
The regular expression 'c[aeiou]+t'
specifies this pattern:
c
must be the first character.c
must be followed by one of the characters inside the brackets,[aeiou]
.- The bracketed pattern must occur one or more times, as indicated by the
+
operator. t
must be the last character, with no characters between the bracketed pattern and thet
.
Extract the pattern. Note that the words CUT and CAT do not match because they are uppercase.
pat = regexpPattern(expression); extract(txt,pat)
ans = 2×1 string "cat" "coat"
Create a new regular expression pattern, but this time specify IgnoreCase
as true
to ignore case with the regular expression. Extract the new pattern.
pat = regexpPattern(expression,"IgnoreCase",true); extract(txt,pat)
ans = 4×1 string "cat" "coat" "CUT" "CAT"
The metacharacters ^
and $
can be used to specify line anchors or text anchors. The behavior that regexpPattern
uses is specified by the Anchors
option.
Create txt
as a string containing newline
characters.
txt = "cat" + newline + "bat" + newline + "rat";
The regular expression '^.+?$' matches one or more characters between two anchors. Create a pattern for this regular expression, and specify Anchors
as “text”
so that the ^ and $ anchors are treated as text anchors. Extract the pattern.
expression = '^.+?$'; pat = regexpPattern(expression,"Anchors","text"); extract(txt,pat)
Create a new regular expression pattern, but this time specify Anchors
as “line”
so that the ^ and $ anchors are treated as line anchors. Extract the new pattern.
pat = regexpPattern(expression,"Anchors","line"); extract(txt,pat)
ans = 3×1 string "cat" "bat" "rat"
Input Arguments
Data Types: char
| cell
| string
Note
regexpPattern
does not support back references, conditions based on back references, and dynamic regular expressions.
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, where Name
is the argument name and Value
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose Name
in quotes.
Example: 'DotExceptNewline',true,'FreeSpacing',false
Dot matching of newline
character, specified as the comma-separated pair consisting of 'DotExceptNewline'
and a logical scalar. Set this option to 0 (false) to omit newline
characters from dot matching.
Example: pat = regexpPattern('m.','DotExceptNewline',true)
Matching white space character, specified as the comma-separated pair consisting of 'FreeSpacing'
and a logical scalar. Set this option to 1 (true) to omit whitespace characters and comments when matching.
Example: pat = regexpPattern('m.','FreeSpacing',false)
Ignore case when matching, specified as the comma-separated pair consisting of'IgnoreCase'
and a logical scalar. Set this option to 1 (true) to match regardless of case.
Example: pat = regexpPattern('m.','IgnoreCase',true)
Metacharacter treatment, specified as the comma-separated pair consisting of'Anchors'
and one of these values:
Value | Description |
---|---|
'text' | Treat the metacharacters ^ and $ as text anchors. This anchors regular expression matches to the beginning or end of text, which might span multiple lines. |
'line' | Treat the metacharacters ^ and $ as line anchors. This anchors regular expression matches to the beginning or end of lines in the text. This option is useful when you have multiline text and do not want matches to span multiple lines. |
Example: pat = regexpPattern('\d+','Anchors','line')
Extended Capabilities
Version History
Introduced in R2020b