10.2.2.3 Optimizing Subqueries with the EXISTS Strategy (original) (raw)

10.2.2.3 Optimizing Subqueries with the EXISTS Strategy

Certain optimizations are applicable to comparisons that use the IN (or =ANY) operator to test subquery results. This section discusses these optimizations, particularly with regard to the challenges that NULL values present. The last part of the discussion suggests how you can help the optimizer.

Consider the following subquery comparison:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

MySQL evaluates queries “from outside to inside.” That is, it first obtains the value of the outer expression_outerexpr_, and then runs the subquery and captures the rows that it produces.

A very useful optimization is to “inform” the subquery that the only rows of interest are those where the inner expression innerexpr is equal to outerexpr. This is done by pushing down an appropriate equality into the subquery's WHERE clause to make it more restrictive. The converted comparison looks like this:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)

After the conversion, MySQL can use the pushed-down equality to limit the number of rows it must examine to evaluate the subquery.

More generally, a comparison of N values to a subquery that returns_N-value rows is subject to the same conversion. If oei andiei_ represent corresponding outer and inner expression values, this subquery comparison:

(oe_1, ..., oe_N) IN
  (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

Becomes:

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND oe_1 = ie_1
                          AND ...
                          AND oe_N = ie_N)

For simplicity, the following discussion assumes a single pair of outer and inner expression values.

The “pushdown” strategy just described works if either of these conditions is true:

... WHERE outer_expr IN (subquery)  

In this case, the WHERE clause rejects the row whether IN (_`subquery`_) returnsNULL or FALSE.

Suppose that outerexpr is known to be a non-NULL value but the subquery does not produce a row such that_outerexpr_ =innerexpr. Then_`outerexpr`_ IN (SELECT ...) evaluates as follows:

In this situation, the approach of looking for rows with_`outerexpr`_ =_`innerexpr`_ is no longer valid. It is necessary to look for such rows, but if none are found, also look for rows where_innerexpr_ isNULL. Roughly speaking, the subquery can be converted to something like this:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND
        (outer_expr=inner_expr OR inner_expr IS NULL))

The need to evaluate the extra IS NULL condition is why MySQL has theref_or_null access method:

mysql> EXPLAIN
       SELECT outer_expr IN (SELECT t2.maybe_null_key
                             FROM t2, t3 WHERE ...)
       FROM t1;
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: ref_or_null
possible_keys: maybe_null_key
          key: maybe_null_key
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Using index
...

The unique_subquery andindex_subquery subquery-specific access methods also have “orNULL” variants.

The additional OR ... IS NULL condition makes query execution slightly more complicated (and some optimizations within the subquery become inapplicable), but generally this is tolerable.

The situation is much worse when_outerexpr_ can beNULL. According to the SQL interpretation of NULL as “unknown value,” NULL IN (SELECT _`innerexpr`_ ...) should evaluate to:

For proper evaluation, it is necessary to be able to check whether the SELECT has produced any rows at all, so_`outerexpr`_ =_`innerexpr`_ cannot be pushed down into the subquery. This is a problem because many real world subqueries become very slow unless the equality can be pushed down.

Essentially, there must be different ways to execute the subquery depending on the value of_outerexpr_.

The optimizer chooses SQL compliance over speed, so it accounts for the possibility that_outerexpr_ might beNULL:

NULL IN (SELECT inner_expr FROM ... WHERE subquery_where)  

It is necessary to execute the originalSELECT here, without any pushed-down equalities of the kind mentioned previously.

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)  

Be converted to this expression that uses a pushed-down condition:

EXISTS (SELECT 1 FROM ... WHERE subquery_where AND outer_expr=inner_expr)  

Without this conversion, subqueries are slow.

To solve the dilemma of whether or not to push down conditions into the subquery, the conditions are wrapped within“trigger” functions. Thus, an expression of the following form:

outer_expr IN (SELECT inner_expr FROM ... WHERE subquery_where)

Is converted into:

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(outer_expr=inner_expr))

More generally, if the subquery comparison is based on several pairs of outer and inner expressions, the conversion takes this comparison:

(oe_1, ..., oe_N) IN (SELECT ie_1, ..., ie_N FROM ... WHERE subquery_where)

And converts it to this expression:

EXISTS (SELECT 1 FROM ... WHERE subquery_where
                          AND trigcond(oe_1=ie_1)
                          AND ...
                          AND trigcond(oe_N=ie_N)
       )

Each trigcond(_`X`_) is a special function that evaluates to the following values:

Note

Trigger functions are not triggers of the kind that you create with CREATE TRIGGER.

Equalities that are wrapped withintrigcond() functions are not first class predicates for the query optimizer. Most optimizations cannot deal with predicates that may be turned on and off at query execution time, so they assume anytrigcond(_`X`_) to be an unknown function and ignore it. Triggered equalities can be used by those optimizations:

When the optimizer uses a triggered condition to create some kind of index lookup-based access (as for the first two items of the preceding list), it must have a fallback strategy for the case when the condition is turned off. This fallback strategy is always the same: Do a full table scan. InEXPLAIN output, the fallback shows up as Full scan on NULL key in theExtra column:

mysql> EXPLAIN SELECT t1.col1,
       t1.col1 IN (SELECT t2.key1 FROM t2 WHERE t2.col2=t1.col2) FROM t1\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: t1
        ...
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: t2
         type: index_subquery
possible_keys: key1
          key: key1
      key_len: 5
          ref: func
         rows: 2
        Extra: Using where; Full scan on NULL key

If you run EXPLAIN followed bySHOW WARNINGS, you can see the triggered condition:

*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `test`.`t1`.`col1` AS `col1`,
         <in_optimizer>(`test`.`t1`.`col1`,
         <exists>(<index_lookup>(<cache>(`test`.`t1`.`col1`) in t2
         on key1 checking NULL
         where (`test`.`t2`.`col2` = `test`.`t1`.`col2`) having
         trigcond(<is_not_null_test>(`test`.`t2`.`key1`))))) AS
         `t1.col1 IN (select t2.key1 from t2 where t2.col2=t1.col2)`
         from `test`.`t1`

The use of triggered conditions has some performance implications. A NULL IN (SELECT ...) expression now may cause a full table scan (which is slow) when it previously did not. This is the price paid for correct results (the goal of the trigger-condition strategy is to improve compliance, not speed).

For multiple-table subqueries, execution of NULL IN (SELECT ...) is particularly slow because the join optimizer does not optimize for the case where the outer expression is NULL. It assumes that subquery evaluations with NULL on the left side are very rare, even if there are statistics that indicate otherwise. On the other hand, if the outer expression might beNULL but never actually is, there is no performance penalty.

To help the query optimizer better execute your queries, use these suggestions:

outer_expr [NOT] IN (SELECT inner_expr FROM ...)  

with this expression:

(outer_expr IS NOT NULL) AND (outer_expr [NOT] IN (SELECT inner_expr FROM ...))  

Then NULL IN (SELECT ...) is never evaluated because MySQL stops evaluatingAND parts as soon as the expression result is clear.
Another possible rewrite:

[NOT] EXISTS (SELECT inner_expr FROM ...  
        WHERE inner_expr=outer_expr)  

Thesubquery_materialization_cost_based flag of the optimizer_switch system variable enables control over the choice between subquery materialization andIN-to-EXISTS subquery transformation. SeeSection 10.9.2, “Switchable Optimizations”.