MySQL :: MySQL 5.7 Reference Manual :: 12.8.2 Regular Expressions (original) (raw)
12.8.2 Regular Expressions
Table 12.14 Regular Expression Functions and Operators
Name | Description |
---|---|
NOT REGEXP | Negation of REGEXP |
REGEXP | Whether string matches regular expression |
RLIKE | Whether string matches regular expression |
A regular expression is a powerful way of specifying a pattern for a complex search. This section discusses the operators available for regular expression matching and illustrates, with examples, some of the special characters and constructs that can be used for regular expression operations. See alsoSection 3.3.4.7, “Pattern Matching”.
MySQL uses Henry Spencer's implementation of regular expressions, which is aimed at conformance with POSIX 1003.2. MySQL uses the extended version to support regular expression pattern-matching operations in SQL statements. This section does not contain all the details that can be found in Henry Spencer'sregex(7)
manual page. That manual page is included in MySQL source distributions, in theregex.7
file under theregex
directory.
Regular Expression Function and Operator Descriptions
- expr NOT REGEXP pat,expr NOT RLIKE pat
This is the same asNOT (_`expr`_ REGEXP_`pat`_)
. - expr REGEXP pat,expr RLIKE pat
Returns 1 if the stringexpr
matches the regular expression specified by the pattern_pat
, 0 otherwise. If eitherexpr
_ or_pat
_ isNULL
, the return value isNULL
.
RLIKE is a synonym for REGEXP.
The pattern can be an extended regular expression, the syntax for which is discussed inRegular Expression Syntax. The pattern need not be a literal string. For example, it can be specified as a string expression or table column.
Note
MySQL uses C escape syntax in strings (for example,\n
to represent the newline character). If you want your_expr
_ or_pat
_ argument to contain a literal\
, you must double it. (Unless theNO_BACKSLASH_ESCAPES SQL mode is enabled, in which case no escape character is used.)
Regular expression operations use the character set and collation of the string expression and pattern arguments when deciding the type of a character and performing the comparison. If the arguments have different character sets or collations, coercibility rules apply as described inSection 10.8.4, “Collation Coercibility in Expressions”. If either argument is a binary string, the arguments are handled in case-sensitive fashion as binary strings.
mysql> SELECT 'Michael!' REGEXP '.*';
+------------------------+
| 'Michael!' REGEXP '.*' |
+------------------------+
| 1 |
+------------------------+
mysql> SELECT 'new*\n*line' REGEXP 'new\\*.\\*line';
+---------------------------------------+
| 'new*\n*line' REGEXP 'new\\*.\\*line' |
+---------------------------------------+
| 0 |
+---------------------------------------+
mysql> SELECT 'a' REGEXP '^[a-d]';
+---------------------+
| 'a' REGEXP '^[a-d]' |
+---------------------+
| 1 |
+---------------------+
Warning
The REGEXP andRLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal.
Regular Expression Syntax
A regular expression describes a set of strings. The simplest regular expression is one that has no special characters in it. For example, the regular expressionhello
matches hello
and nothing else.
Nontrivial regular expressions use certain special constructs so that they can match more than one string. For example, the regular expression hello|world
contains the|
alternation operator and matches either the hello
or world
.
As a more complex example, the regular expressionB[an]*s
matches any of the stringsBananas
, Baaaaas
,Bs
, and any other string starting with aB
, ending with an s
, and containing any number of a
orn
characters in between.
A regular expression for theREGEXP operator may use any of the following special characters and constructs:
^
Match the beginning of a string.
mysql> SELECT 'fo\nfo' REGEXP '^fo$'; -> 0
mysql> SELECT 'fofo' REGEXP '^fo'; -> 1
$
Match the end of a string.
mysql> SELECT 'fo\no' REGEXP '^fo\no$'; -> 1
mysql> SELECT 'fo\no' REGEXP '^fo$'; -> 0
.
Match any character (including carriage return and newline).
mysql> SELECT 'fofo' REGEXP '^f.*$'; -> 1
mysql> SELECT 'fo\r\nfo' REGEXP '^f.*$'; -> 1
a*
Match any sequence of zero or morea
characters.
mysql> SELECT 'Ban' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Baaan' REGEXP '^Ba*n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba*n'; -> 1
a+
Match any sequence of one or morea
characters.
mysql> SELECT 'Ban' REGEXP '^Ba+n'; -> 1
mysql> SELECT 'Bn' REGEXP '^Ba+n'; -> 0
a?
Match either zero or onea
character.
mysql> SELECT 'Bn' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Ban' REGEXP '^Ba?n'; -> 1
mysql> SELECT 'Baan' REGEXP '^Ba?n'; -> 0
de|abc
Alternation; match either of the sequencesde
orabc
.
mysql> SELECT 'pi' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'axe' REGEXP 'pi|apa'; -> 0
mysql> SELECT 'apa' REGEXP 'pi|apa'; -> 1
mysql> SELECT 'apa' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pi' REGEXP '^(pi|apa)$'; -> 1
mysql> SELECT 'pix' REGEXP '^(pi|apa)$'; -> 0
(abc)*
Match zero or more instances of the sequenceabc
.
mysql> SELECT 'pi' REGEXP '^(pi)*$'; -> 1
mysql> SELECT 'pip' REGEXP '^(pi)*$'; -> 0
mysql> SELECT 'pipi' REGEXP '^(pi)*$'; -> 1
{1}
,{2,3}
Repetition;{_`n`_}
and{_`m`_,_`n`_}
notation provide a more general way of writing regular expressions that match many occurrences of the previous atom (or “piece”) of the pattern.m
and_n
_ are integers.a*
Can be written asa{0,}
.a+
Can be written asa{1,}
.a?
Can be written asa{0,1}
.
To be more precise,a{_`n`_}
matches exactlyn
instances ofa
.a{_`n`_,}
matchesn
or more instances ofa
.a{_`m`_,_`n`_}
matchesm
through_n
_ instances ofa
, inclusive. If both_m
_ and_n
_ are given,m
must be less than or equal to_n
.m
andn
_ must be in the range from0
toRE_DUP_MAX
(default 255), inclusive.
mysql> SELECT 'abcde' REGEXP 'a[bcd]{2}e'; -> 0
mysql> SELECT 'abcde' REGEXP 'a[bcd]{3}e'; -> 1
mysql> SELECT 'abcde' REGEXP 'a[bcd]{1,10}e'; -> 1
[a-dX]
,[^a-dX]
Matches any character that is (or is not, if^
is used) eithera
,b
,c
,d
orX
. A-
character between two other characters forms a range that matches all characters from the first character to the second. For example,[0-9]
matches any decimal digit. To include a literal]
character, it must immediately follow the opening bracket[
. To include a literal-
character, it must be written first or last. Any character that does not have a defined special meaning inside a[]
pair matches only itself.
mysql> SELECT 'aXbc' REGEXP '[a-dXYZ]'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]$'; -> 0
mysql> SELECT 'aXbc' REGEXP '^[a-dXYZ]+$'; -> 1
mysql> SELECT 'aXbc' REGEXP '^[^a-dXYZ]+$'; -> 0
mysql> SELECT 'gheis' REGEXP '^[^a-dXYZ]+$'; -> 1
mysql> SELECT 'gheisa' REGEXP '^[^a-dXYZ]+$'; -> 0
[.characters.]
Within a bracket expression (written using[
and]
), matches the sequence of characters of that collating element.characters
is either a single character or a character name likenewline
. The following table lists the permissible character names.
The following table shows the permissible character names and the characters that they match. For characters given as numeric values, the values are represented in octal.Name Character Name Character NUL 0 SOH 001 STX 002 ETX 003 EOT 004 ENQ 005 ACK 006 BEL 007 alert 007 BS 010 backspace '\b' HT 011 tab '\t' LF 012 newline '\n' VT 013 vertical-tab '\v' FF 014 form-feed '\f' CR 015 carriage-return '\r' SO 016 SI 017 DLE 020 DC1 021 DC2 022 DC3 023 DC4 024 NAK 025 SYN 026 ETB 027 CAN 030 EM 031 SUB 032 ESC 033 IS4 034 FS 034 IS3 035 GS 035 IS2 036 RS 036 IS1 037 US 037 space ' ' exclamation-mark '!' quotation-mark '"' number-sign '#' dollar-sign '$' percent-sign '%' ampersand '&' apostrophe '\'' left-parenthesis '(' right-parenthesis ')' asterisk '*' plus-sign '+' comma ',' hyphen '-' hyphen-minus '-' period '.' full-stop '.' slash '/' solidus '/' zero '0' one '1' two '2' three '3' four '4' five '5' six '6' seven '7' eight '8' nine '9' colon ':' semicolon ';' less-than-sign '<' equals-sign '=' greater-than-sign '>' question-mark '?' commercial-at '@' left-square-bracket '[' backslash '\\' reverse-solidus '\\' right-square-bracket ']' circumflex '^' circumflex-accent '^' underscore '_' low-line '_' grave-accent '`' left-brace '{' left-curly-bracket '{' vertical-line '|' right-brace '}' right-curly-bracket '}' tilde '~' DEL 177
mysql> SELECT '~' REGEXP '[[.~.]]'; -> 1
mysql> SELECT '~' REGEXP '[[.tilde.]]'; -> 1
[=character_class=]
Within a bracket expression (written using[
and]
),[=character_class=]
represents an equivalence class. It matches all characters with the same collation value, including itself. For example, ifo
and(+)
are the members of an equivalence class,[[=o=]]
,[[=(+)=]]
, and[o(+)]
are all synonymous. An equivalence class may not be used as an endpoint of a range.[:character_class:]
Within a bracket expression (written using[
and]
),[:character_class:]
represents a character class that matches all characters belonging to that class. The following table lists the standard class names. These names stand for the character classes defined in thectype(3)
manual page. A particular locale may provide other class names. A character class may not be used as an endpoint of a range.Character Class Name Meaning alnum Alphanumeric characters alpha Alphabetic characters blank Whitespace characters cntrl Control characters digit Digit characters graph Graphic characters lower Lowercase alphabetic characters print Graphic or space characters punct Punctuation characters space Space, tab, newline, and carriage return upper Uppercase alphabetic characters xdigit Hexadecimal digit characters
mysql> SELECT 'justalnums' REGEXP '[[:alnum:]]+'; -> 1
mysql> SELECT '!!' REGEXP '[[:alnum:]]+'; -> 0
[[:<:]]
,[[:>:]]
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in thealnum
class or an underscore (_
).
mysql> SELECT 'a word a' REGEXP '[[:<:]]word[[:>:]]'; -> 1
mysql> SELECT 'a xword a' REGEXP '[[:<:]]word[[:>:]]'; -> 0
To use a literal instance of a special character in a regular expression, precede it by two backslash (\) characters. The MySQL parser interprets one of the backslashes, and the regular expression library interprets the other. For example, to match the string 1+2
that contains the special +
character, only the last of the following regular expressions is the correct one:
mysql> SELECT '1+2' REGEXP '1+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\+2'; -> 0
mysql> SELECT '1+2' REGEXP '1\\+2'; -> 1