1.3 What Is New in MySQL 8.0 (original) (raw)

SELECT *  
    FROM t1  
    JOIN t2  
        ON t1.c1=t2.c1;  
SELECT *  
    FROM t1  
    JOIN t2  
        ON (t1.c1 = t2.c1 AND t1.c2 < t2.c2)  
    JOIN t3  
        ON (t2.c1 = t3.c1)  

Hash joins can also be used for Cartesian products—that is, when no join condition is specified.
You can see when the hash join optimization is being used for a particular query usingEXPLAIN FORMAT=TREE or EXPLAIN ANALYZE. (In MySQL 8.0.20 and later, you can also use EXPLAIN, omittingFORMAT=TREE.)
The amount of memory available to a hash join is limited by the value ofjoin_buffer_size. A hash join that requires more than this much memory is executed on disk; the number of disk files that can be used by an on-disk hash join is limited byopen_files_limit.
As of MySQL 8.0.19, thehash_join optimizer switch which was introduced in MySQL 8.0.18 no longer supported (hash_join=on still appears as part of the value of optimizer_switch, but setting it no longer has any effect). The HASH_JOIN andNO_HASH_JOIN optimizer hints are also no longer supported. The switch and the hint are both now deprecated; expect them to be removed in a future MySQL release. In MySQL 8.0.18 and later, hash joins can be disabled using the NO_BNL optimizer switch.
In MySQL 8.0.20 and later, block nested loop is no longer used in the MySQL server, and a hash join is employed any time a block nested loop would have been used previously, even when the query contains no equi-join conditions. This applies to inner non-equijoins, semijoins, antijoins, left outer joins, and right outer joins. Theblock_nested_loop flag for the optimizer_switch system variable as well as theBNL andNO_BNL optimizer hints are still supported, but henceforth control use of hash joins only. In addition, both inner and outer joins (including semijoins and antijoins) can now employ batched key access (BKA), which allocates join buffer memory incrementally so that individual queries need not use up large amounts of resources that they do not actually require for resolution. BKA for inner joins only is supported starting with MySQL 8.0.18.
MySQL 8.0.20 also replaces the executor used in previous versions of MySQL with the iterator executor. This work includes replacement of the old index subquery engines that governed queries of the form WHERE_`value`_ IN (SELECT_`column`_ FROM_`table`_ WHERE ...) for those IN queries which have not been optimized as semijoins, as well as queries materialized in the same form, which formerly depended on the old executor.
For more information and examples, seeSection 10.2.1.4, “Hash Join Optimization”. See alsoBatched Key Access Joins.

mysql> CREATE TABLE d (dt DATETIME, d DATE, t TIME);  
Query OK, 0 rows affected (0.62 sec)  
mysql> CREATE TABLE n (i INT, d DECIMAL, f FLOAT, dc DECIMAL);  
Query OK, 0 rows affected (0.51 sec)  
mysql> CREATE TABLE s (c CHAR(25), vc VARCHAR(25),  
    ->     bn BINARY(50), vb VARBINARY(50), b BLOB, t TEXT,  
    ->     e ENUM('a', 'b', 'c'), se SET('x' ,'y', 'z'));  
Query OK, 0 rows affected (0.50 sec)  
mysql> EXPLAIN FORMAT=TREE SELECT * from d JOIN n ON d.dt = n.i\G  
*************************** 1. row ***************************  
EXPLAIN: -> Inner hash join (cast(d.dt as double) = cast(n.i as double))  
(cost=0.70 rows=1)  
    -> Table scan on n  (cost=0.35 rows=1)  
    -> Hash  
        -> Table scan on d  (cost=0.35 rows=1)  
mysql> EXPLAIN FORMAT=TREE SELECT * from s JOIN d ON d.dt = s.c\G  
*************************** 1. row ***************************  
EXPLAIN: -> Inner hash join (d.dt = cast(s.c as datetime(6)))  (cost=0.72 rows=1)  
    -> Table scan on d  (cost=0.37 rows=1)  
    -> Hash  
        -> Table scan on s  (cost=0.35 rows=1)  
1 row in set (0.01 sec)  
mysql> EXPLAIN FORMAT=TREE SELECT * from n JOIN s ON n.d = s.c\G  
*************************** 1. row ***************************  
EXPLAIN: -> Inner hash join (cast(n.d as double) = cast(s.c as double))  (cost=0.70 rows=1)  
    -> Table scan on s  (cost=0.35 rows=1)  
    -> Hash  
        -> Table scan on n  (cost=0.35 rows=1)  
1 row in set (0.00 sec)  

Such casts can also be seen by executing EXPLAIN [FORMAT=TRADITIONAL], in which case it is also necessary to issue SHOW WARNINGS after executing theEXPLAIN statement.

INSERT INTO t SET a=9,b=5  
    ON DUPLICATE KEY UPDATE a=VALUES(a)+VALUES(b);  

Using the alias new for the new row, and, in some cases, the aliases m andn for this row's columns, theINSERT statement can be rewritten in many different ways, some examples of which are shown here:

INSERT INTO t SET a=9,b=5 AS new  
    ON DUPLICATE KEY UPDATE a=new.a+new.b;  
INSERT INTO t VALUES(9,5) AS new  
    ON DUPLICATE KEY UPDATE a=new.a+new.b;  
INSERT INTO t SET a=9,b=5 AS new(m,n)  
    ON DUPLICATE KEY UPDATE a=m+n;  
INSERT INTO t VALUES(9,5) AS new(m,n)  
    ON DUPLICATE KEY UPDATE a=m+n;  

For more information and examples, seeSection 15.2.7.2, “INSERT ... ON DUPLICATE KEY UPDATE Statement”.

SELECT a FROM t1 FORCE INDEX (i_a) FOR JOIN WHERE a=1 AND b=2;  
SELECT /*+ JOIN_INDEX(t1 i_a) */ a FROM t1 WHERE a=1 AND b=2;  

The optimizer hints listed previously follow the same basic rules for syntax and usage as existing index-level optimizer hints.
These optimizer hints are intended to replace FORCE INDEX and IGNORE INDEX, which we plan to deprecate in a future MySQL release, and subsequently to remove from MySQL. They do not implement a single exact equivalent for USE INDEX; instead, you can employ one or more ofNO_INDEX,NO_JOIN_INDEX,NO_GROUP_INDEX, orNO_ORDER_INDEX to achieve the same effect.
For further information and examples of use, seeIndex-Level Optimizer Hints.

CAST(  
    JSON_UNQUOTE( JSON_EXTRACT(json_doc, path) )  
    AS type  
);  

You can also specify ON EMPTY,ON ERROR, or both clauses, similar to those employed withJSON_TABLE().
You can use JSON_VALUE() to create an index on an expression on a JSON column like this:

CREATE TABLE t1(  
    j JSON,  
    INDEX i1 ( (JSON_VALUE(j, '$.id' RETURNING UNSIGNED)) )  
);  
INSERT INTO t1 VALUES ROW('{"id": "123", "name": "shoes", "price": "49.95"}');  

A query using this expression, such as that shown here, can make use of the index:

SELECT j->"$.name" as name, j->"$.price" as price  
    FROM t1  
    WHERE JSON_VALUE(j, '$.id' RETURNING UNSIGNED) = 123;  

In many cases, this is simpler than creating a generated column from the JSON column and then creating an index on the generated column.
For more information and examples, see the description ofJSON_VALUE().

mysql> CREATE USER 'mary'@'localhost' COMMENT 'This is Mary Smith\'s account';  
Query OK, 0 rows affected (0.33 sec)  
mysql> ALTER USER 'mary'@'localhost'  
    -≫     ATTRIBUTE '{"fname":"Mary", "lname":"Smith"}';  
Query OK, 0 rows affected (0.14 sec)  
mysql> ALTER USER 'mary'@'localhost'  
    -≫     ATTRIBUTE '{"email":"mary.smith@example.com"}';  
Query OK, 0 rows affected (0.12 sec)  
mysql> SELECT  
    ->    USER,  
    ->    HOST,  
    ->    ATTRIBUTE->>"$.fname" AS 'First Name',  
    ->    ATTRIBUTE->>"$.lname" AS 'Last Name',  
    ->    ATTRIBUTE->>"$.email" AS 'Email',  
    ->    ATTRIBUTE->>"$.comment" AS 'Comment'  
    -> FROM INFORMATION_SCHEMA.USER_ATTRIBUTES  
    -> WHERE USER='mary' AND HOST='localhost'\G  
*************************** 1. row ***************************  
      USER: mary  
      HOST: localhost  
First Name: Mary  
 Last Name: Smith  
     Email: mary.smith@example.com  
   Comment: This is Mary Smith's account  
1 row in set (0.00 sec)  

For more information and examples, seeSection 15.7.1.3, “CREATE USER Statement”, Section 15.7.1.1, “ALTER USER Statement”, andSection 28.3.46, “The INFORMATION_SCHEMA USER_ATTRIBUTES Table”.

REVOKE [IF EXISTS] privilege_or_role  
    ON object  
    FROM user_or_role [IGNORE UNKNOWN USER]  

IF EXISTS causes an unsuccessfulREVOKE statement to raise a warning instead of an error, as long as the named target user or role actually exists, despite any references in the statement to any roles or privileges which cannot be found.
IGNORE UNKNOWN USER causes an unsuccessful REVOKE to raise a warning rather than an error when the target user or role named in the statement cannot be found.
For further information and examples, seeSection 15.7.1.8, “REVOKE Statement”.

my_row_id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT INVISIBLE PRIMARY KEY  

GIPK mode is not enabled by default. To enable it, set thesql_generate_invisible_primary_key server system variable to ON.
Generated invisible primary keys are normally visible in the output of statements such as SHOW CREATE TABLE and SHOW INDEX, as well as in MySQL Information Schema tables such as the COLUMNS andSTATISTICS tables. You can cause them to be hidden in such cases instead, by settingshow_gipk_in_create_table_and_information_schema to OFF.
As part of this work, a new--skip-generated-invisible-primary-key option is added to mysqldump andmysqlpump to exclude generated invisible primary keys, columns, and column values from their output.
GIPKs and replication between tables with or without primary keys. In MySQL Replication, a replica effectively ignores any setting forsql_generate_invisible_primary_key on the source, such that it has no effect on replicated tables. MySQL 8.0.32 and later makes it possible for the replica to add a generated invisible primary key to anyInnoDB table that otherwise, as replicated, has no primary key. You can do this by invokingCHANGE REPLICATION SOURCE TO ... REQUIRE_TABLE_PRIMARY_KEY_CHECK = GENERATE on the replica.
REQUIRE_TABLE_PRIMARY_KEY_CHECK = GENERATE is not compatible with MySQL Group Replication.
For further information, seeSection 15.1.20.11, “Generated Invisible Primary Keys”.

mysql> EXPLAIN FORMAT=TREE (  
    ->   (SELECT a, b, c FROM t ORDER BY a LIMIT 3) ORDER BY b LIMIT 2  
    -> ) ORDER BY c LIMIT 1\G  
*************************** 1. row ***************************  
EXPLAIN: -> Limit: 1 row(s)  (cost=5.55..5.55 rows=1)  
    -> Sort: c, limit input to 1 row(s) per chunk  (cost=2.50 rows=0)  
        -> Table scan on <result temporary>  (cost=2.50 rows=0)  
            -> Temporary table  (cost=5.55..5.55 rows=1)  
                -> Limit: 2 row(s)  (cost=2.95..2.95 rows=1)  
                    -> Sort: b, limit input to 2 row(s) per chunk  (cost=2.50 rows=0)  
                        -> Table scan on <result temporary>  (cost=2.50 rows=0)  
                            -> Temporary table  (cost=2.95..2.95 rows=1)  
                                -> Limit: 3 row(s)  (cost=0.35 rows=1)  
                                    -> Sort: t.a, limit input to 3 row(s) per chunk  (cost=0.35 rows=1)  
                                        -> Table scan on t  (cost=0.35 rows=1)  
1 row in set (0.00 sec)  

MySQL follows SQL standard semantics when collapsing bodies of parenthesized query expressions, so that a higher outer limit cannot override an inner lower one. For example,(SELECT ... LIMIT 5) LIMIT 10 can return no more than five rows.
The 63-level limit is imposed only after the MySQL Optimizer's parser has performed any simplifications or merges which it can.
For more information, seeSection 15.2.11, “Parenthesized Query Expressions”.

ANALYZE TABLE tbl_name  
  UPDATE HISTOGRAM ON col_name  
  USING DATA 'json_data'  

This statement creates or overwrites a histogram for column_colname_ of table_tblname_ using the histogram's JSON representation_jsondata_. After executing this statement, you can verify that the histogram was created or updated by querying the Information SchemaCOLUMN_STATISTICS table, like this:

SELECT HISTOGRAM FROM INFORMATION_SCHEMA.COLUMN_STATISTICS  
  WHERE TABLE_NAME='tbl_name'  
  AND COLUMN_NAME='col_name';  

The column value returned should be the same_jsondata_ used in the previousANALYZE TABLE statement.
This can be of use in cases where values deemed important are missed by the histogram sampling process. When this happens, you may want to modify the histogram or set your own histogram based on the complete data set. In addition, sampling a large user data set and building a histogram from it are resource-heavy operations which can impact user queries. With this enhancement, histogram generation can be moved off the (primary) server and performed on a replica instead; the generated histograms can then be assigned to the proper table columns on the source server.
For more information and examples, seeHistogram Statistics Analysis.