15.2.14 Set Operations with UNION, INTERSECT, and EXCEPT (original) (raw)
15.2.14 Set Operations with UNION, INTERSECT, and EXCEPT
- Result Set Column Names and Data Types
- Set Operations with TABLE and VALUES Statements
- Set Operations using DISTINCT and ALL
- Set Operations with ORDER BY and LIMIT
- Limitations of Set Operations
SQL set operations combine the results of multiple query blocks into a single result. A query block, sometimes also known as a simple table, is any SQL statement that returns a result set, such asSELECT. MySQL 8.0 (8.0.19 and later) also supports TABLE andVALUES statements. See the individual descriptions of these statements elsewhere in this chapter for additional information.
The SQL standard defines the following three set operations:
- UNION: Combine all results from two query blocks into a single result, omitting any duplicates.
- INTERSECT: Combine only those rows which the results of two query blocks have in common, omitting any duplicates.
- EXCEPT: For two query blocks_
A
_ andB
, return all results fromA
which are not also present inB
, omitting any duplicates.
(Some database systems, such as Oracle, useMINUS
for the name of this operator. This is not supported in MySQL.)
MySQL has long supported UNION
; MySQL 8.0 adds support for INTERSECT
andEXCEPT
(MySQL 8.0.31 and later).
Each of these set operators supports an ALL
modifier. When the ALL
keyword follows a set operator, this causes duplicates to be included in the result. See the following sections covering the individual operators for more information and examples.
All three set operators also support a DISTINCT
keyword, which suppresses duplicates in the result. Since this is the default behavior for set operators, it is usually not necessary to specify DISTINCT
explicitly.
In general, query blocks and set operations can be combined in any number and order. A greatly simplified representation is shown here:
query_block [set_op query_block] [set_op query_block] ...
query_block:
SELECT | TABLE | VALUES
set_op:
UNION | INTERSECT | EXCEPT
This can be represented more accurately, and in greater detail, like this:
query_expression:
[with_clause] /* WITH clause */
query_expression_body
[order_by_clause] [limit_clause] [into_clause]
query_expression_body:
query_term
| query_expression_body UNION [ALL | DISTINCT] query_term
| query_expression_body EXCEPT [ALL | DISTINCT] query_term
query_term:
query_primary
| query_term INTERSECT [ALL | DISTINCT] query_primary
query_primary:
query_block
| '(' query_expression_body [order_by_clause] [limit_clause] [into_clause] ')'
query_block: /* also known as a simple table */
query_specification /* SELECT statement */
| table_value_constructor /* VALUES statement */
| explicit_table /* TABLE statement */
You should be aware that INTERSECT
is evaluated before UNION
or EXCEPT
. This means that, for example, TABLE x UNION TABLE y INTERSECT TABLE z
is always evaluated as TABLE x UNION (TABLE y INTERSECT TABLE z)
. SeeSection 15.2.8, “INTERSECT Clause”, for more information.
In addition, you should keep in mind that, while theUNION
and INTERSECT
set operators are commutative (ordering is not significant),EXCEPT
is not (order of operands affects the outcome). In other words, all of the following statements are true:
TABLE x UNION TABLE y
andTABLE y UNION TABLE x
produce the same result, although the ordering of the rows may differ. You can force them to be the same usingORDER BY
; seeORDER BY and LIMIT in Unions.TABLE x INTERSECT TABLE y
andTABLE y INTERSECT TABLE x
return the same result.TABLE x EXCEPT TABLE y
andTABLE y EXCEPT TABLE x
do not yield the same result. See Section 15.2.4, “EXCEPT Clause”, for an example.
More information and examples can be found in the sections that follow.
Result Set Column Names and Data Types
The column names for the result of a set operation are taken from the column names of the first query block. Example:
mysql> CREATE TABLE t1 (x INT, y INT);
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO t1 VALUES ROW(4,-2), ROW(5,9);
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> CREATE TABLE t2 (a INT, b INT);
Query OK, 0 rows affected (0.04 sec)
mysql> INSERT INTO t2 VALUES ROW(1,2), ROW(3,4);
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
mysql> TABLE t1 UNION TABLE t2;
+------+------+
| x | y |
+------+------+
| 4 | -2 |
| 5 | 9 |
| 1 | 2 |
| 3 | 4 |
+------+------+
4 rows in set (0.00 sec)
mysql> TABLE t2 UNION TABLE t1;
+------+------+
| a | b |
+------+------+
| 1 | 2 |
| 3 | 4 |
| 4 | -2 |
| 5 | 9 |
+------+------+
4 rows in set (0.00 sec)
This is true for UNION
,EXCEPT
, and INTERSECT
queries.
Selected columns listed in corresponding positions of each query block should have the same data type. For example, the first column selected by the first statement should have the same type as the first column selected by the other statements. If the data types of corresponding result columns do not match, the types and lengths of the columns in the result take into account the values retrieved by all of the query blocks. For example, the column length in the result set is not constrained to the length of the value from the first statement, as shown here:
mysql> SELECT REPEAT('a',1) UNION SELECT REPEAT('b',20);
+----------------------+
| REPEAT('a',1) |
+----------------------+
| a |
| bbbbbbbbbbbbbbbbbbbb |
+----------------------+
Set Operations with TABLE and VALUES Statements
Beginning with MySQL 8.0.19, you can also use aTABLE statement orVALUES statement wherever you can employ the equivalent SELECT statement. Assume that tables t1
andt2
are created and populated as shown here:
CREATE TABLE t1 (x INT, y INT);
INSERT INTO t1 VALUES ROW(4,-2),ROW(5,9);
CREATE TABLE t2 (a INT, b INT);
INSERT INTO t2 VALUES ROW(1,2),ROW(3,4);
The preceding being the case, and disregarding the column names in the output of the queries beginning withVALUES, all of the followingUNION
queries yield the same result:
SELECT * FROM t1 UNION SELECT * FROM t2;
TABLE t1 UNION SELECT * FROM t2;
VALUES ROW(4,-2), ROW(5,9) UNION SELECT * FROM t2;
SELECT * FROM t1 UNION TABLE t2;
TABLE t1 UNION TABLE t2;
VALUES ROW(4,-2), ROW(5,9) UNION TABLE t2;
SELECT * FROM t1 UNION VALUES ROW(4,-2),ROW(5,9);
TABLE t1 UNION VALUES ROW(4,-2),ROW(5,9);
VALUES ROW(4,-2), ROW(5,9) UNION VALUES ROW(4,-2),ROW(5,9);
To force the column names to be the same, wrap the query block on the left-hand side in a SELECT
statement, and use aliases, like this:
mysql> SELECT * FROM (TABLE t2) AS t(x,y) UNION TABLE t1;
+------+------+
| x | y |
+------+------+
| 1 | 2 |
| 3 | 4 |
| 4 | -2 |
| 5 | 9 |
+------+------+
4 rows in set (0.00 sec)
Set Operations using DISTINCT and ALL
By default, duplicate rows are removed from results of set operations. The optional DISTINCT
keyword has the same effect but makes it explicit. With the optionalALL
keyword, duplicate-row removal does not occur and the result includes all matching rows from all queries in the union.
You can mix ALL
andDISTINCT
in the same query. Mixed types are treated such that a set operation usingDISTINCT
overrides any such operation usingALL
to its left. ADISTINCT
set can be produced explicitly by using DISTINCT
withUNION,INTERSECT, orEXCEPT, or implicitly by using the set operations with no following DISTINCT
or ALL
keyword.
In MySQL 8.0.19 and later, set operations work the same way when one or more TABLE statements,VALUES statements, or both, are used to generate the set.
Set Operations with ORDER BY and LIMIT
To apply an ORDER BY
orLIMIT
clause to an individual query block used as part of a union, intersection, or other set operation, parenthesize the query block, placing the clause inside the parentheses, like this:
(SELECT a FROM t1 WHERE a=10 AND b=1 ORDER BY a LIMIT 10)
UNION
(SELECT a FROM t2 WHERE a=11 AND b=2 ORDER BY a LIMIT 10);
(TABLE t1 ORDER BY x LIMIT 10)
INTERSECT
(TABLE t2 ORDER BY a LIMIT 10);
Use of ORDER BY
for individual query blocks or statements implies nothing about the order in which the rows appear in the final result because the rows produced by a set operation are by default unordered. Therefore, ORDER BY
in this context typically is used in conjunction with LIMIT
, to determine the subset of the selected rows to retrieve, even though it does not necessarily affect the order of those rows in the final result. IfORDER BY
appears withoutLIMIT
within a query block, it is optimized away because it has no effect in any case.
To use an ORDER BY
orLIMIT
clause to sort or limit the entire result of a set operation, place the ORDER BY
or LIMIT
after the last statement:
SELECT a FROM t1
EXCEPT
SELECT a FROM t2 WHERE a=11 AND b=2
ORDER BY a LIMIT 10;
TABLE t1
UNION
TABLE t2
ORDER BY a LIMIT 10;
If one or more individual statements make use of ORDER BY
, LIMIT
, or both, and, in addition, you wish to apply an ORDER BY, LIMIT, or both to the entire result, then each such individual statement must be enclosed in parentheses.
(SELECT a FROM t1 WHERE a=10 AND b=1)
EXCEPT
(SELECT a FROM t2 WHERE a=11 AND b=2)
ORDER BY a LIMIT 10;
(TABLE t1 ORDER BY a LIMIT 10)
UNION
TABLE t2
ORDER BY a LIMIT 10;
A statement with no ORDER BY
orLIMIT
clause does need to be parenthesized; replacing TABLE t2
with (TABLE t2)
in the second statement of the two just shown does not alter the result of the UNION
.
You can also use ORDER BY
andLIMIT
withVALUES statements in set operations, as shown in this example using themysql client:
mysql> VALUES ROW(4,-2), ROW(5,9), ROW(-1,3)
-> UNION
-> VALUES ROW(1,2), ROW(3,4), ROW(-1,3)
-> ORDER BY column_0 DESC LIMIT 3;
+----------+----------+
| column_0 | column_1 |
+----------+----------+
| 5 | 9 |
| 4 | -2 |
| 3 | 4 |
+----------+----------+
3 rows in set (0.00 sec)
(You should keep in mind that neither TABLE
statements nor VALUES
statements accept aWHERE
clause.)
This kind of ORDER BY
cannot use column references that include a table name (that is, names in_tblname
.colname
_ format). Instead, provide a column alias in the first query block, and refer to the alias in the ORDER BY
clause. (You can also refer to the column in the ORDER BY
clause using its column position, but such use of column positions is deprecated, and thus subject to eventual removal in a future MySQL release.)
If a column to be sorted is aliased, the ORDER BY
clause must refer to the alias, not the column name. The first of the following statements is permitted, but the second fails with anUnknown column 'a' in 'order clause'
error:
(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY b;
(SELECT a AS b FROM t) UNION (SELECT ...) ORDER BY a;
To cause rows in a UNION result to consist of the sets of rows retrieved by each query block one after the other, select an additional column in each query block to use as a sort column and add an ORDER BY
clause that sorts on that column following the last query block:
(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)
UNION
(SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col;
To maintain sort order within individual results, add a secondary column to the ORDER BY
clause:
(SELECT 1 AS sort_col, col1a, col1b, ... FROM t1)
UNION
(SELECT 2, col2a, col2b, ... FROM t2) ORDER BY sort_col, col1a;
Use of an additional column also enables you to determine which query block each row comes from. Extra columns can provide other identifying information as well, such as a string that indicates a table name.
Limitations of Set Operations
Set operations in MySQL are subject to some limitations, which are described in the next few paragraphs.
Set operations including SELECT statements have the following limitations:
HIGH_PRIORITY
in the firstSELECT
has no effect.HIGH_PRIORITY
in any subsequentSELECT
produces a syntax error.- Only the last
SELECT
statement can use anINTO
clause. However, the entireUNION
result is written to theINTO
output destination.
As of MySQL 8.0.20, these two UNION
variants containing INTO
are deprecated; you should expect support for them to be removed in a future version of MySQL:
- In the trailing query block of a query expression, use of
INTO
beforeFROM
produces a warning. Example:
... UNION SELECT * INTO OUTFILE 'file_name' FROM table_name;
- In a parenthesized trailing block of a query expression, use of
INTO
(regardless of its position relative toFROM
) produces a warning. Example:
... UNION (SELECT * INTO OUTFILE 'file_name' FROM table_name);
Those variants are deprecated because they are confusing, as if they collect information from the named table rather than the entire query expression (the UNION
).
Set operations with an aggregate function in an ORDER BY
clause are rejected withER_AGGREGATE_ORDER_FOR_UNION. Although the error name might suggest that this is exclusive toUNION
queries, the preceding is also true forEXCEPT
and INTERSECT
queries, as shown here:
mysql> TABLE t1 INTERSECT TABLE t2 ORDER BY MAX(x);
ERROR 3028 (HY000): Expression #1 of ORDER BY contains aggregate function and applies to a UNION, EXCEPT or INTERSECT
A locking clause (such as FOR UPDATE
orLOCK IN SHARE MODE
) applies to the query block it follows. This means that, in aSELECT
statement used with set operations, a locking clause can be used only if the query block and locking clause are enclosed in parentheses.