MySQL Window Functions (original) (raw)
MySQL **Window Functions are advanced **SQL capabilities that enable expensive calculations across sets of rows related to the current row. Aggregate functions collapse the result set. These functions, in general, permit ranking, running totals, moving averages, and access to data from other rows within the same result set. Window functions are particularly helpful in analytical queries and reporting.
In this article, We will learn about **MySQL Window Functions with the help of examples and so on.
What are Window Functions?
- **Window functions in **SQL perform calculations across a set of table rows related to the current row.
- Unlike **aggregate functions which return a single value for a group of rows, window functions return a result for each row in the result set.
- These functions are highly useful for tasks such as **ranking, calculating running totals, and finding moving averages while still retaining individual row details.
**Syntax:
The basic syntax for a **window function in **MySQL is as follows:
window_function_name([expression]) OVER (
[PARTITION BY expression]
[ORDER BY expression [ASC|DESC]]
[ROWS or RANGE frame_clause]
)
**where,
- **window_function_name: which is nothing but the name of your window function, check this example which can be ROW_NUMBER,.
- **PARTITION BY: Result set is divided into partitions, and then the window function is applied.
- **ORDER BY: This specifies the order of the rows within each partition.
- **ROWS or RANGE — Defines the window of rows that should be considered for the calculation.
Common Window Functions in MySQL
Now we will be learning different Windows Functions in MySQL:
1. ROW_NUMBER()
This function is used to assigns a unique sequential integer to rows within a partition
**Example:
SELECT
employee_id,
department_id,
salary,
ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num
FROM employees;
**Output:
employee_id | department_id | salary | row_num |
---|---|---|---|
101 | 1 | 90000 | 1 |
102 | 1 | 85000 | 2 |
103 | 2 | 95000 | 1 |
104 | 2 | 70000 | 2 |
2. RANK() and DENSE_RANK()
The use of this function **id to leave gaps in the ranking when they are ties and also assigns a ranking within a partition.
**Example:
SELECT
employee_id,
department_id,
salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank
FROM employees;
**Output:
employee_id | department_id | salary | rank | dense_rank |
---|---|---|---|---|
101 | 1 | 90000 | 1 | 1 |
102 | 1 | 85000 | 2 | 2 |
103 | 1 | 85000 | 2 | 2 |
104 | 1 | 75000 | 4 | 3 |
3. SUM()
The use of this function is to calculate the sum of the columns with in a window.
**Example:
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY employee_id) AS cumulative_salary
FROM employees;
**Output:
employee_id | salary | cumulative_salary |
---|---|---|
101 | 50000 | 50000 |
102 | 60000 | 110000 |
103 | 70000 | 180000 |
104 | 80000 | 260000 |
4. AVG()
This function is responsible for the moving average of the across the set of rows.
**Example:
SELECT
employee_id,
salary,
AVG(salary) OVER (ORDER BY employee_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg
FROM employees;
**Output:
employee_id | salary | moving_avg |
---|---|---|
101 | 50000 | 50000.0 |
102 | 60000 | 55000.0 |
103 | 70000 | 60000.0 |
104 | 80000 | 70000.0 |
5. LEAD() and LAG()
LEAD() and LAG() functions allow you to access subsequent or previous rows' data without the need for self-joins.
**Example:
SELECT
employee_id,
salary,
LEAD(salary, 1) OVER (ORDER BY employee_id) AS next_salary,
LAG(salary, 1) OVER (ORDER BY employee_id) AS previous_salary
FROM employees;
**Output:
employee_id | salary | next_salary | previous_salary |
---|---|---|---|
101 | 50000 | 60000 | NULL |
102 | 60000 | 70000 | 50000 |
103 | 70000 | 80000 | 60000 |
104 | 80000 | NULL | 70000 |
MySQL Window Functions with Different Clauses
**Window Functions use the OVER()
clause, which can be further customized using ORDER BY
and PARTITION BY
clauses. Below are the different ways to use **MySQL window functions with these clauses, along with practical examples.
1. Using ORDER BY
The ORDER BY
clause within the OVER()
function is essential to determine the order in which the rows are processed. This ordering influences how window functions, such as ROW_NUMBER()
, RANK()
, and cumulative calculations, are applied.
**Example: The following query demonstrates how to calculate the cumulative salary for employees, ordered by their employee_id
.
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY employee_id) AS cumulative_salary
FROM employees;
**Output:
employee_id | salary | cumulative_salary |
---|---|---|
101 | 50000 | 50000 |
102 | 60000 | 110000 |
103 | 70000 | 180000 |
104 | 80000 | 260000 |
**Explanation:
SUM(salary)
calculates the cumulative sum of salaries.OVER (ORDER BY employee_id)
ensures that the rows are ordered byemployee_id
before applying the window function.- The result will show the
cumulative_salary
, which is the running total of salaries.
2. Using PARTITION BY
The PARTITION BY
clause is used to divide the result set into partitions. The window function is then applied independently to each partition. This is useful when you want to perform calculations within groups, such as department-wise cumulative salaries.
**Example: In this example, we calculate the dept_cumulative_salary
for each department, ordered by the salary in descending order.
SELECT
employee_id,
department_id,
salary,
SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS dept_cumulative_salary
FROM employees;
**Output:
employee_id | department_id | salary | dept_cumulative_salary |
---|---|---|---|
105 | 1 | 90000 | 90000 |
104 | 1 | 85000 | 175000 |
103 | 1 | 75000 | 250000 |
108 | 2 | 95000 | 95000 |
107 | 2 | 70000 | 165000 |
106 | 2 | 60000 | 225000 |
Window Frames: ROWS and RANGE
The window frame also defines which set of rows is included when calling a window function.
**ROWS: This defines the number of rows in the frame.
**RANGE: Indicates the range of values to be included.
**Example of ROWS
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY employee_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sum_salary
FROM employees;
**Output:
employee_id | salary | sum_salary |
---|---|---|
101 | 50000 | 50000 |
102 | 60000 | 110000 |
103 | 70000 | 180000 |
104 | 80000 | 210000 |
**Example of RANGE
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY salary RANGE BETWEEN 1000 PRECEDING AND CURRENT ROW) AS sum_salary
FROM employees;
**Output:
employee_id | salary | sum_salary |
---|---|---|
101 | 50000 | 50000 |
102 | 60000 | 110000 |
103 | 70000 | 180000 |
104 | 80000 | 260000 |
Using Frame Clauses
If we talk about the frames clauses then it indicates that which **subset of rows the windows function applies the calculation. Let's see an example of it.
**Example:
SELECT
employee_id,
salary,
SUM(salary) OVER (ORDER BY employee_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sum_salary
FROM employees;
**Output:
employee_id | salary | sum_salary |
---|---|---|
101 | 50000 | 50000 |
102 | 60000 | 110000 |
103 | 70000 | 180000 |
104 | 80000 | 210000 |
Advantages of Window Functions
- **Efficiency: Accomplish complex calculations that otherwise may involve **multiple joins or **subqueries in an single query.
- **Flexibility: Apply functions across partitions of data. Very useful in **reporting, analytics.
- **Clarity: Simplify SQL queries by staying away from **convoluted logic and nested subqueries.
Limitations
- **Performance: In particular, window functions may be very dear in **computation, especially over large input datasets.
- One of the limitation is that it can take some time to become fluent with the syntax and behavior
Conclusion
Windows Function very powerful feature in MySQL. It extends **SQL's capabilities to run more complex and efficient queries. Window functions realize **complicated calculations and analyses directly in your SQL queries, reducing the need for additional application logics or processing. Although in some cases views can add complexity and sometimes raise performance considerations, the flexibility and clarity they bring to the table make them an important tool in any **SQL developer's arsenal.