MySQL :: MySQL 8.0 Reference Manual :: 10.2.1.2 Range Optimization (original) (raw)

10.2.1.2 Range Optimization

The range access method uses a single index to retrieve a subset of table rows that are contained within one or several index value intervals. It can be used for a single-part or multiple-part index. The following sections describe conditions under which the optimizer uses range access.

Range Access Method for Single-Part Indexes

For a single-part index, index value intervals can be conveniently represented by corresponding conditions in theWHERE clause, denoted asrange conditions rather than “intervals.”

The definition of a range condition for a single-part index is as follows:

“Constant value” in the preceding descriptions means one of the following:

Here are some examples of queries with range conditions in the WHERE clause:

SELECT * FROM t1
  WHERE key_col > 1
  AND key_col < 10;

SELECT * FROM t1
  WHERE key_col = 1
  OR key_col IN (15,18,20);

SELECT * FROM t1
  WHERE key_col LIKE 'ab%'
  OR key_col BETWEEN 'bar' AND 'foo';

Some nonconstant values may be converted to constants during the optimizer constant propagation phase.

MySQL tries to extract range conditions from theWHERE clause for each of the possible indexes. During the extraction process, conditions that cannot be used for constructing the range condition are dropped, conditions that produce overlapping ranges are combined, and conditions that produce empty ranges are removed.

Consider the following statement, wherekey1 is an indexed column andnonkey is not indexed:

SELECT * FROM t1 WHERE
  (key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR
  (key1 < 'bar' AND nonkey = 4) OR
  (key1 < 'uux' AND key1 > 'z');

The extraction process for key key1 is as follows:

  1. Start with original WHERE clause:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR key1 LIKE '%b')) OR  
(key1 < 'bar' AND nonkey = 4) OR  
(key1 < 'uux' AND key1 > 'z')  
  1. Remove nonkey = 4 and key1 LIKE '%b' because they cannot be used for a range scan. The correct way to remove them is to replace them with TRUE, so that we do not miss any matching rows when doing the range scan. Replacing them with TRUE yields:
(key1 < 'abc' AND (key1 LIKE 'abcde%' OR TRUE)) OR  
(key1 < 'bar' AND TRUE) OR  
(key1 < 'uux' AND key1 > 'z')  
  1. Collapse conditions that are always true or false:
    • (key1 LIKE 'abcde%' OR TRUE) is always true
    • (key1 < 'uux' AND key1 > 'z') is always false
      Replacing these conditions with constants yields:
(key1 < 'abc' AND TRUE) OR (key1 < 'bar' AND TRUE) OR (FALSE)  

Removing unnecessary TRUE andFALSE constants yields:

(key1 < 'abc') OR (key1 < 'bar')  
  1. Combining overlapping intervals into one yields the final condition to be used for the range scan:
(key1 < 'bar')  

In general (and as demonstrated by the preceding example), the condition used for a range scan is less restrictive than the WHERE clause. MySQL performs an additional check to filter out rows that satisfy the range condition but not the full WHERE clause.

The range condition extraction algorithm can handle nestedAND/OR constructs of arbitrary depth, and its output does not depend on the order in which conditions appear inWHERE clause.

MySQL does not support merging multiple ranges for therange access method for spatial indexes. To work around this limitation, you can use a UNION with identicalSELECT statements, except that you put each spatial predicate in a differentSELECT.

Range Access Method for Multiple-Part Indexes

Range conditions on a multiple-part index are an extension of range conditions for a single-part index. A range condition on a multiple-part index restricts index rows to lie within one or several key tuple intervals. Key tuple intervals are defined over a set of key tuples, using ordering from the index.

For example, consider a multiple-part index defined askey1(_`keypart1`_,_`keypart2`_,_`keypart3`_), and the following set of key tuples listed in key order:

key_part1  key_part2  key_part3
  NULL       1          'abc'
  NULL       1          'xyz'
  NULL       2          'foo'
   1         1          'abc'
   1         1          'xyz'
   1         2          'abc'
   2         1          'aaa'

The condition _`keypart1`_ = 1 defines this interval:

(1,-inf,-inf) <= (key_part1,key_part2,key_part3) < (1,+inf,+inf)

The interval covers the 4th, 5th, and 6th tuples in the preceding data set and can be used by the range access method.

By contrast, the condition_`keypart3`_ = 'abc' does not define a single interval and cannot be used by the range access method.

The following descriptions indicate how range conditions work for multiple-part indexes in greater detail.

    key_part1 cmp const1  
AND key_part2 cmp const2  
AND ...  
AND key_partN cmp constN;  

Here, const1,const2, … are constants, cmp is one of the=,<=>, or IS NULL comparison operators, and the conditions cover all index parts. (That is, there are N conditions, one for each part of an_N_-part index.) For example, the following is a range condition for a three-partHASH index:

key_part1 = 1 AND key_part2 IS NULL AND key_part3 = 'foo'  

For the definition of what is considered to be a constant, seeRange Access Method for Single-Part Indexes.

key_part1 = 'foo' AND key_part2 >= 10 AND key_part3 > 10  

The single interval is:

('foo',10,-inf) < (key_part1,key_part2,key_part3) < ('foo',+inf,+inf)  

It is possible that the created interval contains more rows than the initial condition. For example, the preceding interval includes the value ('foo', 11, 0), which does not satisfy the original condition.

(key_part1 = 1 AND key_part2 < 2) OR (key_part1 > 5)  

The intervals are:

(1,-inf) < (key_part1,key_part2) < (1,2)  
(5,-inf) < (key_part1,key_part2)  

In this example, the interval on the first line uses one key part for the left bound and two key parts for the right bound. The interval on the second line uses only one key part. The key_len column in the EXPLAIN output indicates the maximum length of the key prefix used.
In some cases, key_len may indicate that a key part was used, but that might be not what you would expect. Suppose that_keypart1_ and_keypart2_ can beNULL. Then thekey_len column displays two key part lengths for the following condition:

key_part1 >= 1 AND key_part2 < 2  

But, in fact, the condition is converted to this:

key_part1 >= 1 AND key_part2 IS NOT NULL  

For a description of how optimizations are performed to combine or eliminate intervals for range conditions on a single-part index, seeRange Access Method for Single-Part Indexes. Analogous steps are performed for range conditions on multiple-part indexes.

Equality Range Optimization of Many-Valued Comparisons

Consider these expressions, where_colname_ is an indexed column:

col_name IN(val1, ..., valN)
col_name = val1 OR ... OR col_name = valN

Each expression is true if_colname_ is equal to any of several values. These comparisons are equality range comparisons (where the “range” is a single value). The optimizer estimates the cost of reading qualifying rows for equality range comparisons as follows:

With index dives, the optimizer makes a dive at each end of a range and uses the number of rows in the range as the estimate. For example, the expression_`colname`_ IN (10, 20, 30) has three equality ranges and the optimizer makes two dives per range to generate a row estimate. Each pair of dives yields an estimate of the number of rows that have the given value.

Index dives provide accurate row estimates, but as the number of comparison values in the expression increases, the optimizer takes longer to generate a row estimate. Use of index statistics is less accurate than index dives but permits faster row estimation for large value lists.

Theeq_range_index_dive_limit system variable enables you to configure the number of values at which the optimizer switches from one row estimation strategy to the other. To permit use of index dives for comparisons of up to N equality ranges, seteq_range_index_dive_limit to N + 1. To disable use of statistics and always use index dives regardless of_N_, seteq_range_index_dive_limit to 0.

To update table index statistics for best estimates, useANALYZE TABLE.

Prior to MySQL 8.0, there is no way of skipping the use of index dives to estimate index usefulness, except by using theeq_range_index_dive_limit system variable. In MySQL 8.0, index dive skipping is possible for queries that satisfy all these conditions:

For EXPLAIN FOR CONNECTION, the output changes as follows if index dives are skipped:

Without FOR CONNECTION,EXPLAIN output does not change when index dives are skipped.

After execution of a query for which index dives are skipped, the corresponding row in the Information SchemaOPTIMIZER_TRACE table contains an index_dives_for_range_access value ofskipped_due_to_force_index.

Skip Scan Range Access Method

Consider the following scenario:

CREATE TABLE t1 (f1 INT NOT NULL, f2 INT NOT NULL, PRIMARY KEY(f1, f2));
INSERT INTO t1 VALUES
  (1,1), (1,2), (1,3), (1,4), (1,5),
  (2,1), (2,2), (2,3), (2,4), (2,5);
INSERT INTO t1 SELECT f1, f2 + 5 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 10 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 20 FROM t1;
INSERT INTO t1 SELECT f1, f2 + 40 FROM t1;
ANALYZE TABLE t1;

EXPLAIN SELECT f1, f2 FROM t1 WHERE f2 > 40;

To execute this query, MySQL can choose an index scan to fetch all rows (the index includes all columns to be selected), then apply the f2 > 40 condition from the WHERE clause to produce the final result set.

A range scan is more efficient than a full index scan, but cannot be used in this case because there is no condition onf1, the first index column. However, as of MySQL 8.0.13, the optimizer can perform multiple range scans, one for each value of f1, using a method called Skip Scan that is similar to Loose Index Scan (see Section 10.2.1.17, “GROUP BY Optimization”):

  1. Skip between distinct values of the first index part,f1 (the index prefix).
  2. Perform a subrange scan on each distinct prefix value for the f2 > 40 condition on the remaining index part.

For the data set shown earlier, the algorithm operates like this:

  1. Get the first distinct value of the first key part (f1 = 1).
  2. Construct the range based on the first and second key parts (f1 = 1 AND f2 > 40).
  3. Perform a range scan.
  4. Get the next distinct value of the first key part (f1 = 2).
  5. Construct the range based on the first and second key parts (f1 = 2 AND f2 > 40).
  6. Perform a range scan.

Using this strategy decreases the number of accessed rows because MySQL skips the rows that do not qualify for each constructed range. This Skip Scan access method is applicable under the following conditions:

Use of Skip Scan is indicated in EXPLAIN output as follows:

Use of Skip Scan is indicated in optimizer trace output by a"skip scan" element of this form:

"skip_scan_range": {
  "type": "skip_scan",
  "index": index_used_for_skip_scan,
  "key_parts_used_for_access": [key_parts_used_for_access],
  "range": [range]
}

You may also see a"best_skip_scan_summary" element. If Skip Scan is chosen as the best range access variant, a"chosen_range_access_summary" is written. If Skip Scan is chosen as the overall best access method, a"best_access_path" element is present.

Use of Skip Scan is subject to the value of theskip_scan flag of theoptimizer_switch system variable. See Section 10.9.2, “Switchable Optimizations”. By default, this flag is on. To disable it, set skip_scan tooff.

In addition to using theoptimizer_switch system variable to control optimizer use of Skip Scan session-wide, MySQL supports optimizer hints to influence the optimizer on a per-statement basis. SeeSection 10.9.3, “Optimizer Hints”.

Range Optimization of Row Constructor Expressions

The optimizer is able to apply the range scan access method to queries of this form:

SELECT ... FROM t1 WHERE ( col_1, col_2 ) IN (( 'a', 'b' ), ( 'c', 'd' ));

Previously, for range scans to be used, it was necessary to write the query as:

SELECT ... FROM t1 WHERE ( col_1 = 'a' AND col_2 = 'b' )
OR ( col_1 = 'c' AND col_2 = 'd' );

For the optimizer to use a range scan, queries must satisfy these conditions:

For more information about the optimizer and row constructors, seeSection 10.2.1.22, “Row Constructor Expression Optimization”

Limiting Memory Use for Range Optimization

To control the memory available to the range optimizer, use therange_optimizer_max_mem_size system variable:

Warning    3170    Memory capacity of N bytes for  
                   'range_optimizer_max_mem_size' exceeded. Range  
                   optimization was not done for this query.  

For individual queries that exceed the available range optimization memory and for which the optimizer falls back to less optimal plans, increasing therange_optimizer_max_mem_size value may improve performance.

To estimate the amount of memory needed to process a range expression, use these guidelines:

SELECT COUNT(*) FROM t  
WHERE a=1 OR a=2 OR a=3 OR .. . a=N;  
SELECT COUNT(*) FROM t  
WHERE a=1 AND b=1 AND c=1 ... N;  
SELECT COUNT(*) FROM t  
WHERE a IN (1,2, ..., M) AND b IN (1,2, ..., N);  

Each literal value in anIN() list counts as a predicate combined with OR. If there are two IN() lists, the number of predicates combined withOR is the product of the number of literal values in each list. Thus, the number of predicates combined withOR in the preceding case is_M_ ×_N_.