Aggregate functions | Snowflake Documentation (original) (raw)
Aggregate functions operate on values across rows to perform mathematical calculations such as sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations.
An aggregate function takes multiple rows (actually, zero, one, or more rows) as input and produces a single output. In contrast, scalar functions take one row as input and produce one row (one value) as output.
An aggregate function always returns exactly one row, even when the input contains zero rows. Typically, if the input contains zero rows, the output is NULL. However, an aggregate function could return 0
, an empty string, or some other value when passed zero rows.
List of functions (by sub-category)¶
Introductory example¶
The following example illustrates the difference between an aggregate function (AVG) and a scalar function (COS). The scalar function returns one output row for each input row, while the aggregate function returns one output row for multiple input rows:
Create a table and populate it with values:
CREATE TABLE simple (x INTEGER, y INTEGER); INSERT INTO simple (x, y) VALUES (10, 20), (20, 44), (30, 70);
Query the table:
SELECT x, y FROM simple ORDER BY x,y;
+----+----+ | X | Y | |----+----| | 10 | 20 | | 20 | 44 | | 30 | 70 | +----+----+
The scalar function returns one output row for each input row.
SELECT COS(x) FROM simple ORDER BY x;
+---------------+
COS(X) |
---|
-0.8390715291 |
0.4080820618 |
0.1542514499 |
+---------------+ |
The aggregate function returns one output row for multiple input rows:
SELECT SUM(x) FROM simple;
+--------+
SUM(X) |
---|
60 |
+--------+ |
Aggregate functions and NULL values¶
Some aggregate functions ignore NULL values. For example, AVG calculates the average of values 1
, 5
, and NULL
to be 3
, based on the following formula:
(1 + 5) / 2 = 3
In both the numerator and the denominator, only the two non-NULL values are used.
If all of the values passed to the aggregate function are NULL, then the aggregate function returns NULL.
Some aggregate functions can be passed more than one column. For example:
SELECT COUNT(col1, col2) FROM table1;
In these instances, the aggregate function ignores a row if any individual column is NULL.
For example, in the following query, COUNT returns 1
, not 4
, because three of the four rows contain at least one NULL value in the selected columns:
Create a table and populate it with values:
CREATE OR REPLACE TABLE test_null_aggregate_functions (x INT, y INT); INSERT INTO test_null_aggregate_functions (x, y) VALUES (1, 2), -- No NULLs. (3, NULL), -- One but not all columns are NULL. (NULL, 6), -- One but not all columns are NULL. (NULL, NULL); -- All columns are NULL.
Query the table:
SELECT COUNT(x, y) FROM test_null_aggregate_functions;
+-------------+
COUNT(X, Y) |
---|
1 |
+-------------+ |
If SUM is called with an expression that references two or more columns, and if one or more of those columns is NULL, then the expression evaluates to NULL, and the row is ignored:
SELECT SUM(x + y) FROM test_null_aggregate_functions;
+------------+
SUM(X + Y) |
---|
3 |
+------------+ |
This behavior differs from the behavior of GROUP BY, which does not discard rows when some columns are NULL:
SELECT x AS X_COL, y AS Y_COL FROM test_null_aggregate_functions GROUP BY x, y;
+-------+-------+ | X_COL | Y_COL | |-------+-------| | 1 | 2 | | 3 | NULL | | NULL | 6 | | NULL | NULL | +-------+-------+