regexpPattern - Pattern that matches specified regular expression - MATLAB (original) (raw)

Pattern that matches specified regular expression

Since R2020b

Syntax

Description

[pat](#mw%5Fc14a2abf-9679-4c96-ade0-df447711fedc%5Fsep%5Fmw%5F5d8462f7-c5cb-4493-9ed2-15c426e9be6f) = regexpPattern([expression](#mw%5Fc14a2abf-9679-4c96-ade0-df447711fedc%5Fsep%5Fshared-expression),[Name,Value](#namevaluepairarguments)) specifies additional options with one or more name-value pair arguments. For example, you can specify 'IgnoreCase' as true to ignore case when matching..

example

Examples

collapse all

Use regexpPattern to specify patterns using regular expressions that can be used as inputs for text-searching functions.

Find words that start with c, end with t, and contain one or more vowels in between.

txt = "bat cat can car coat court CUT ct CAT-scan"; expression = 'c[aeiou]+t';

The regular expression 'c[aeiou]+t' specifies this pattern:

c must be the first character.
c must be followed by one of the characters inside the brackets, [aeiou].
The bracketed pattern must occur one or more times, as indicated by the + operator.
t must be the last character, with no characters between the bracketed pattern and the t.

Extract the pattern. Note, the words CUT and CAT do not match because they are uppercase.

pat = regexpPattern(expression); extract(txt,pat)

ans = 2×1 string "cat" "coat"

Patterns created using regexpPattern can be combined with other pattern functions to create more complicated patterns. Use whitespacePattern and lettersPattern to create a new pattern that also matches words after the regular expression matches, and then extract the new pattern.

pat = regexpPattern(expression) + whitespacePattern + lettersPattern; extract(txt,pat)

ans = 2×1 string "cat can" "coat court"

Create a string containing a newline character. Use the regular expression '.' to match any character except newline characters.

txt = "First Line" + newline + "Second Line"

txt = "First Line Second Line"

The regular expression '.+' matches one or more of any character including newline characters. Count how many times the pattern matches.

pat = regexpPattern(expression); count(txt,pat)

Create a new regular expression pattern, but this time specify DotExceptNewline as true so that the pattern does not match newline characters. Count how many times the pattern matches.

pat = regexpPattern(expression,"DotExceptNewline",true); count(txt,pat)

Create txt as a string.

The expression '. *' only matches individual characters because of the whitespace between . and *. Create a pattern to match the regular expression '. *', and then extract the pattern.

expression = '. *'; pat = regexpPattern(expression); extract(txt,pat)

ans = 10×1 string "H" "e" "l" "l" "o " "W" "o" "r" "l" "d"

Create a new regular expression pattern, but this time specify FreeSpacing as true to ignore whitespaces in the regular expression. Extract the new pattern.

pat = regexpPattern(expression,"FreeSpacing",true); extract(txt,pat)

Find words that start with c, end with t, and contain one or more vowels in between, regardless of case.

txt = "bat cat can car coat court CUT ct CAT-scan"; expression = 'c[aeiou]+t';

The regular expression 'c[aeiou]+t' specifies this pattern:

c must be the first character.
c must be followed by one of the characters inside the brackets, [aeiou].
The bracketed pattern must occur one or more times, as indicated by the + operator.
t must be the last character, with no characters between the bracketed pattern and the t.

Extract the pattern. Note that the words CUT and CAT do not match because they are uppercase.

pat = regexpPattern(expression); extract(txt,pat)

ans = 2×1 string "cat" "coat"

Create a new regular expression pattern, but this time specify IgnoreCase as true to ignore case with the regular expression. Extract the new pattern.

pat = regexpPattern(expression,"IgnoreCase",true); extract(txt,pat)

ans = 4×1 string "cat" "coat" "CUT" "CAT"

The metacharacters ^ and $ can be used to specify line anchors or text anchors. The behavior that regexpPattern uses is specified by the Anchors option.

Create txt as a string containing newline characters.

txt = "cat" + newline + "bat" + newline + "rat";

The regular expression '^.+?$' matches one or more characters between two anchors. Create a pattern for this regular expression, and specify Anchors as “text” so that the ^ and $ anchors are treated as text anchors. Extract the pattern.

expression = '^.+?$'; pat = regexpPattern(expression,"Anchors","text"); extract(txt,pat)

Create a new regular expression pattern, but this time specify Anchors as “line” so that the ^ and $ anchors are treated as line anchors. Extract the new pattern.

pat = regexpPattern(expression,"Anchors","line"); extract(txt,pat)

ans = 3×1 string "cat" "bat" "rat"

Input Arguments

collapse all

Data Types: char | cell | string

Note

regexpPattern does not support back references, conditions based on back references, and dynamic regular expressions.

Name-Value Arguments

collapse all

Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'DotExceptNewline',true,'FreeSpacing',false

Dot matching of newline character, specified as the comma-separated pair consisting of 'DotExceptNewline' and a logical scalar. Set this option to 0 (false) to omit newline characters from dot matching.

Example: pat = regexpPattern('m.','DotExceptNewline',true)

Matching white space character, specified as the comma-separated pair consisting of 'FreeSpacing' and a logical scalar. Set this option to 1 (true) to omit whitespace characters and comments when matching.

Example: pat = regexpPattern('m.','FreeSpacing',false)

Ignore case when matching, specified as the comma-separated pair consisting of'IgnoreCase' and a logical scalar. Set this option to 1 (true) to match regardless of case.

Example: pat = regexpPattern('m.','IgnoreCase',true)

Metacharacter treatment, specified as the comma-separated pair consisting of'Anchors' and one of these values:

Value	Description
'text'	Treat the metacharacters ^ and $ as text anchors. This anchors regular expression matches to the beginning or end of text, which might span multiple lines.
'line'	Treat the metacharacters ^ and $ as line anchors. This anchors regular expression matches to the beginning or end of lines in the text. This option is useful when you have multiline text and do not want matches to span multiple lines.

Example: pat = regexpPattern('\d+','Anchors','line')

Extended Capabilities

Version History

Introduced in R2020b