MySQL :: MySQL 8.4 Reference Manual :: 14.19.1 Aggregate Function Descriptions (original) (raw)

14.19.1 Aggregate Function Descriptions

This section describes aggregate functions that operate on sets of values. They are often used with a GROUP BY clause to group values into subsets.

Unless otherwise stated, aggregate functions ignoreNULL values.

If you use an aggregate function in a statement containing noGROUP BY clause, it is equivalent to grouping on all rows. For more information, seeSection 14.19.3, “MySQL Handling of GROUP BY”.

Most aggregate functions can be used as window functions. Those that can be used this way are signified in their syntax description by[_`overclause`_], representing an optional OVER clause.overclause is described inSection 14.20.2, “Window Function Concepts and Syntax”, which also includes other information about window function usage.

For numeric arguments, the variance and standard deviation functions return a DOUBLE value. The SUM() andAVG() functions return aDECIMAL value for exact-value arguments (integer or DECIMAL), and a DOUBLE value for approximate-value arguments (FLOAT orDOUBLE).

The SUM() andAVG() aggregate functions do not work with temporal values. (They convert the values to numbers, losing everything after the first nonnumeric character.) To work around this problem, convert to numeric units, perform the aggregate operation, and convert back to a temporal value. Examples:

SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(time_col))) FROM tbl_name;
SELECT FROM_DAYS(SUM(TO_DAYS(date_col))) FROM tbl_name;

Functions such as SUM() orAVG() that expect a numeric argument cast the argument to a number if necessary. ForSET orENUM values, the cast operation causes the underlying numeric value to be used.

The BIT_AND(),BIT_OR(), andBIT_XOR() aggregate functions perform bit operations.

MySQL bit functions and operators permit binary string type arguments (BINARY,VARBINARY, and theBLOB types) and return a value of like type, which enables them to take arguments and produce return values larger than 64 bits. For discussion about argument evaluation and result types for bit operations, see the introductory discussion in Section 14.12, “Bit Functions and Operators”.

mysql> SELECT student_name, AVG(test_score)  
       FROM student  
       GROUP BY student_name;  
mysql> SELECT student.student_name,COUNT(*)  
       FROM student,course  
       WHERE student.student_id=course.student_id  
       GROUP BY student_name;  

COUNT(*) is somewhat different in that it returns a count of the number of rows retrieved, whether or not they containNULL values.
For transactional storage engines such asInnoDB, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
InnoDB does not keep an internal count of rows in a table because concurrent transactions might“see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
SELECT COUNT(*) FROM_`tblname`_ query performance for InnoDB tables is optimized for single-threaded workloads if there are no extra clauses such as WHERE orGROUP BY.
InnoDB processes SELECT COUNT(*) statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, InnoDB processes SELECT COUNT(*) statements by scanning the clustered index.
Processing of SELECT COUNT(*) statements takes some time if index records are not entirely in the buffer pool. For a faster count, create a counter table and let your application update it according to the inserts and deletes it does. However, this method may not scale well in situations where thousands of concurrent transactions are initiating updates to the same counter table. If an approximate row count is sufficient, useSHOW TABLE STATUS.
InnoDB handles SELECT COUNT(*) and SELECT COUNT(1) operations in the same way. There is no performance difference.
For MyISAM tables,COUNT(*) is optimized to return very quickly if theSELECT retrieves from one table, no other columns are retrieved, and there is noWHERE clause. For example:

mysql> SELECT COUNT(*) FROM student;  

This optimization only applies to MyISAM tables, because an exact row count is stored for this storage engine and can be accessed very quickly.COUNT(1) is only subject to the same optimization if the first column is defined as NOT NULL.

mysql> SELECT COUNT(DISTINCT results) FROM student;  

In MySQL, you can obtain the number of distinct expression combinations that do not contain NULL by giving a list of expressions. In standard SQL, you would have to do a concatenation of all expressions insideCOUNT(DISTINCT ...).

GROUP_CONCAT([DISTINCT] expr [,expr ...]  
             [ORDER BY {unsigned_integer | col_name | expr}  
                 [ASC | DESC] [,col_name ...]]  
             [SEPARATOR str_val])  
mysql> SELECT student_name,  
         GROUP_CONCAT(test_score)  
       FROM student  
       GROUP BY student_name;  

Or:

mysql> SELECT student_name,  
         GROUP_CONCAT(DISTINCT test_score  
                      ORDER BY test_score DESC SEPARATOR ' ')  
       FROM student  
       GROUP BY student_name;  

In MySQL, you can get the concatenated values of expression combinations. To eliminate duplicate values, use theDISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause. The default is ascending order; this may be specified explicitly using the ASC keyword. The default separator between values in a group is comma (,). To specify a separator explicitly, use SEPARATOR followed by the string literal value that should be inserted between group values. To eliminate the separator altogether, specifySEPARATOR ''.
The result is truncated to the maximum length that is given by the group_concat_max_len system variable, which has a default value of 1024. The value can be set higher, although the effective maximum length of the return value is constrained by the value ofmax_allowed_packet. The syntax to change the value ofgroup_concat_max_len at runtime is as follows, where val is an unsigned integer:

SET [GLOBAL | SESSION] group_concat_max_len = val;  

The return value is a nonbinary or binary string, depending on whether the arguments are nonbinary or binary strings. The result type is TEXT orBLOB unlessgroup_concat_max_len is less than or equal to 512, in which case the result type isVARCHAR orVARBINARY.
If GROUP_CONCAT() is invoked from within the mysql client, binary string results display using hexadecimal notation, depending on the value of the--binary-as-hex. For more information about that option, see Section 6.5.1, “mysql — The MySQL Command-Line Client”.
See also CONCAT() andCONCAT_WS():Section 14.8, “String Functions and Operators”.

mysql> SELECT o_id, attribute, value FROM t3;  
+------+-----------+-------+  
| o_id | attribute | value |  
+------+-----------+-------+  
|    2 | color     | red   |  
|    2 | fabric    | silk  |  
|    3 | color     | green |  
|    3 | shape     | square|  
+------+-----------+-------+  
4 rows in set (0.00 sec)  
mysql> SELECT o_id, JSON_ARRAYAGG(attribute) AS attributes  
    -> FROM t3 GROUP BY o_id;  
+------+---------------------+  
| o_id | attributes          |  
+------+---------------------+  
|    2 | ["color", "fabric"] |  
|    3 | ["color", "shape"]  |  
+------+---------------------+  
2 rows in set (0.00 sec)  
mysql> SELECT o_id, attribute, value FROM t3;  
+------+-----------+-------+  
| o_id | attribute | value |  
+------+-----------+-------+  
|    2 | color     | red   |  
|    2 | fabric    | silk  |  
|    3 | color     | green |  
|    3 | shape     | square|  
+------+-----------+-------+  
4 rows in set (0.00 sec)  
mysql> SELECT o_id, JSON_OBJECTAGG(attribute, value)  
    -> FROM t3 GROUP BY o_id;  
+------+---------------------------------------+  
| o_id | JSON_OBJECTAGG(attribute, value)      |  
+------+---------------------------------------+  
|    2 | {"color": "red", "fabric": "silk"}    |  
|    3 | {"color": "green", "shape": "square"} |  
+------+---------------------------------------+  
2 rows in set (0.00 sec)  

Duplicate key handling. When the result of this function is normalized, values having duplicate keys are discarded. In keeping with the MySQL JSON data type specification that does not permit duplicate keys, only the last value encountered is used with that key in the returned object (“last duplicate key wins”). This means that the result of using this function on columns from a SELECT can depend on the order in which the rows are returned, which is not guaranteed.
When used as a window function, if there are duplicate keys within a frame, only the last value for the key is present in the result. The value for the key from the last row in the frame is deterministic if the ORDER BY specification guarantees that the values have a specific order. If not, the resulting value of the key is nondeterministic.
Consider the following:

mysql> CREATE TABLE t(c VARCHAR(10), i INT);  
Query OK, 0 rows affected (0.33 sec)  
mysql> INSERT INTO t VALUES ('key', 3), ('key', 4), ('key', 5);  
Query OK, 3 rows affected (0.10 sec)  
Records: 3  Duplicates: 0  Warnings: 0  
mysql> SELECT c, i FROM t;  
+------+------+  
| c    | i    |  
+------+------+  
| key  |    3 |  
| key  |    4 |  
| key  |    5 |  
+------+------+  
3 rows in set (0.00 sec)  
mysql> SELECT JSON_OBJECTAGG(c, i) FROM t;  
+----------------------+  
| JSON_OBJECTAGG(c, i) |  
+----------------------+  
| {"key": 5}           |  
+----------------------+  
1 row in set (0.00 sec)  
mysql> DELETE FROM t;  
Query OK, 3 rows affected (0.08 sec)  
mysql> INSERT INTO t VALUES ('key', 3), ('key', 5), ('key', 4);  
Query OK, 3 rows affected (0.06 sec)  
Records: 3  Duplicates: 0  Warnings: 0  
mysql> SELECT c, i FROM t;  
+------+------+  
| c    | i    |  
+------+------+  
| key  |    3 |  
| key  |    5 |  
| key  |    4 |  
+------+------+  
3 rows in set (0.00 sec)  
mysql> SELECT JSON_OBJECTAGG(c, i) FROM t;  
+----------------------+  
| JSON_OBJECTAGG(c, i) |  
+----------------------+  
| {"key": 4}           |  
+----------------------+  
1 row in set (0.00 sec)  

The key chosen from the last query is nondeterministic. If the query does not use GROUP BY (which usually imposes its own ordering regardless) and you prefer a particular key ordering, you can invokeJSON_OBJECTAGG() as a window function by including an OVER clause with anORDER BY specification to impose a particular order on frame rows. The following examples show what happens with and without ORDER BY for a few different frame specifications.
Without ORDER BY, the frame is the entire partition:

mysql> SELECT JSON_OBJECTAGG(c, i)  
       OVER () AS json_object FROM t;  
+-------------+  
| json_object |  
+-------------+  
| {"key": 4}  |  
| {"key": 4}  |  
| {"key": 4}  |  
+-------------+  

With ORDER BY, where the frame is the default of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW (in both ascending and descending order):

mysql> SELECT JSON_OBJECTAGG(c, i)  
       OVER (ORDER BY i) AS json_object FROM t;  
+-------------+  
| json_object |  
+-------------+  
| {"key": 3}  |  
| {"key": 4}  |  
| {"key": 5}  |  
+-------------+  
mysql> SELECT JSON_OBJECTAGG(c, i)  
       OVER (ORDER BY i DESC) AS json_object FROM t;  
+-------------+  
| json_object |  
+-------------+  
| {"key": 5}  |  
| {"key": 4}  |  
| {"key": 3}  |  
+-------------+  

With ORDER BY and an explicit frame of the entire partition:

mysql> SELECT JSON_OBJECTAGG(c, i)  
       OVER (ORDER BY i  
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)  
        AS json_object  
       FROM t;  
+-------------+  
| json_object |  
+-------------+  
| {"key": 5}  |  
| {"key": 5}  |  
| {"key": 5}  |  
+-------------+  

To return a particular key value (such as the smallest or largest), include a LIMIT clause in the appropriate query. For example:

mysql> SELECT JSON_OBJECTAGG(c, i)  
       OVER (ORDER BY i) AS json_object FROM t LIMIT 1;  
+-------------+  
| json_object |  
+-------------+  
| {"key": 3}  |  
+-------------+  
mysql> SELECT JSON_OBJECTAGG(c, i)  
       OVER (ORDER BY i DESC) AS json_object FROM t LIMIT 1;  
+-------------+  
| json_object |  
+-------------+  
| {"key": 5}  |  
+-------------+  

See Normalization, Merging, and Autowrapping of JSON Values, for additional information and examples.

mysql> SELECT student_name, MIN(test_score), MAX(test_score)  
       FROM student  
       GROUP BY student_name;  

For MAX(), MySQL currently compares ENUM andSET columns by their string value rather than by the string's relative position in the set. This differs from how ORDER BY compares them.

mysql> SELECT student_name, MIN(test_score), MAX(test_score)  
       FROM student  
       GROUP BY student_name;  

For MIN(), MySQL currently compares ENUM andSET columns by their string value rather than by the string's relative position in the set. This differs from how ORDER BY compares them.