Django Group By via Practical Examples (original) (raw)

Summary: in this tutorial, you’ll learn how to use Django Group By with aggregate functions to calculate aggregation for groups.

Introduction to the Django Group By #

The SQL GROUP BY clause groups the rows returned by a query into groups. Typically, you use aggregate functions like count, min, max, avg, and sum with the GROUP BY clause to return an aggregated value for each group.

Here’s the basic usage of the GROUP BY clause in a SELECT statement:

SELECT column_1, AGGREGATE(column_2) FROM table_name GROUP BY column1;Code language: SQL (Structured Query Language) (sql)

In Django, you can you the annotate() method with the values() to apply the aggregation on groups like this:

(Entity.objects .values('column_2') .annotate(value=AGGREGATE('column_1')) )Code language: Python (python)

In this syntax;

Notice that the order of calling values() and annotates() matter. If you do not call the values() method first and annotate() second, the expression won’t produce aggregate results.

We’ll use the Employee and Department models from the HR application for the demonstration. The Emloyee and Department models map to the hr_employee and hr_department tables in the database:

1) Django Group By with Count example #

The following example uses the values() and annotate() method to get the number of employees by department:

>>> (Employee.objects ... .values('department') ... .annotate(head_count=Count('department')) ... .order_by('department') ... ) SELECT "hr_employee"."department_id", COUNT("hr_employee"."department_id") AS "head_count" FROM "hr_employee" GROUP BY "hr_employee"."department_id" ORDER BY "hr_employee"."department_id" ASC LIMIT 21 Execution time: 0.001492s [Database: default] <QuerySet [{'department': 1, 'head_count': 30}, {'department': 2, 'head_count': 40}, {'department': 3, 'head_count': 28}, {'department': 4, 'head_count': 29}, {'department': 5, 'head_count': 29}, {'department': 6, 'head_count': 30}, {'department': 7, 'head_count': 34}]>Code language: Python (python)

How it works.

First, group the employees by department using the values() method:

values('department')Code language: Python (python)

Second, apply the Count() to each group:

annotate(head_count=Count('department'))Code language: Python (python)

Third, sort the objects in the QuerySet by department:

order_by('department')Code language: Python (python)

Behind the scenes, Django executes the SELECT statement with the GROUP BY clause:

SELECT "hr_employee"."department_id", COUNT("hr_employee"."department_id") AS "head_count" FROM "hr_employee" GROUP BY "hr_employee"."department_id" ORDER BY "hr_employee"."department_id" ASC LIMIT 21 Code language: SQL (Structured Query Language) (sql)

2) Django Group By with Sum example #

Similarly, you can use the Sum() aggregate to calculate the total salary of employees in each department:

>>> (Employee.objects ... .values('department') ... .annotate(total_salary=Sum('salary')) ... .order_by('department') ... ) SELECT "hr_employee"."department_id", SUM("hr_employee"."salary") AS "total_salary" FROM "hr_employee" GROUP BY "hr_employee"."department_id" ORDER BY "hr_employee"."department_id" ASC LIMIT 21 Execution time: 0.000927s [Database: default] <QuerySet [{'department': 1, 'total_salary': Decimal('3615341.00')}, {'department': 2, 'total_salary': Decimal('5141611.00')}, {'department': 3, 'total_salary': Decimal('3728988.00')}, {'department': 4, 'total_salary': Decimal('3955669.00')}, {'department': 5, 'total_salary': Decimal('4385784.00')}, {'department': 6, 'total_salary': Decimal('4735927.00')}, {'department': 7, 'total_salary': Decimal('4598788.00')}]> Code language: Python (python)

3) Django Group By with Min, Max, and Avg example #

The following example applies multiple aggregate functions to groups to get the lowest, average, and highest salary of employees in each department:

>>> (Employee.objects ... .values('department') ... .annotate( ... min_salary=Min('salary'), ... max_salary=Max('salary'), ... avg_salary=Avg('salary') ... ) ... .order_by('department') ... ) SELECT "hr_employee"."department_id", MIN("hr_employee"."salary") AS "min_salary", MAX("hr_employee"."salary") AS "max_salary", AVG("hr_employee"."salary") AS "avg_salary" FROM "hr_employee" GROUP BY "hr_employee"."department_id" ORDER BY "hr_employee"."department_id" ASC LIMIT 21 Execution time: 0.001670s [Database: default] <QuerySet [{'department': 1, 'min_salary': Decimal('45427.00'), 'max_salary': Decimal('149830.00'), 'avg_salary': Decimal('120511.366666666667')}, {'department': 2, 'min_salary': Decimal('46637.00'), 'max_salary': Decimal('243462.00'), 'avg_salary': Decimal('128540.275000000000')}, {'department': 3, 'min_salary': Decimal('40762.00'), 'max_salary': Decimal('248265.00'), 'avg_salary': Decimal('133178.142857142857')}, {'department': 4, 'min_salary': Decimal('43000.00'), 'max_salary': Decimal('238016.00'), 'avg_salary': Decimal('136402.379310344828')}, {'department': 5, 'min_salary': Decimal('42080.00'), 'max_salary': Decimal('246403.00'), 'avg_salary': Decimal('151233.931034482759')}, {'department': 6, 'min_salary': Decimal('58356.00'), 'max_salary': Decimal('248312.00'), 'avg_salary': Decimal('157864.233333333333')}, {'department': 7, 'min_salary': Decimal('40543.00'), 'max_salary': Decimal('238892.00'), 'avg_salary': Decimal('135258.470588235294')}]>Code language: Python (python)

4) Django group by with join example #

The following example uses the values() and annotate() methods to get the number of employees per department:

>>> (Department.objects ... .values('name') ... .annotate( ... head_count=Count('employee') ... ) ... ) SELECT "hr_department"."name", COUNT("hr_employee"."id") AS "head_count" FROM "hr_department" LEFT OUTER JOIN "hr_employee" ON ("hr_department"."id" = "hr_employee"."department_id") GROUP BY "hr_department"."name" LIMIT 21 Execution time: 0.001953s [Database: default] <QuerySet [{'name': 'Marketing', 'head_count': 28}, {'name': 'Finance', 'head_count': 29}, {'name': 'SCM', 'head_count': 29}, {'name': 'GA', 'head_count': 30}, {'name': 'Sales', 'head_count': 40}, {'name': 'IT', 'head_count': 30}, {'name': 'HR', 'head_count': 34}]>Code language: Python (python)

How it works.

Behind the scenes, Django uses a LEFT JOIN to join the hr_department table with hr_employee table and apply the COUNT() function to each group.

Django group by with having #

To apply a condition to the groups, you use the filter() method. For example, the following uses the filter() method to get the department with head counts are more than 30:

>>> (Department.objects ... .values('name') ... .annotate( ... head_count=Count('employee') ... ) ... .filter(head_count__gt=30) ... ) SELECT "hr_department"."name", COUNT("hr_employee"."id") AS "head_count" FROM "hr_department" LEFT OUTER JOIN "hr_employee" ON ("hr_department"."id" = "hr_employee"."department_id") GROUP BY "hr_department"."name" HAVING COUNT("hr_employee"."id") > 30 LIMIT 21 Execution time: 0.002893s [Database: default] <QuerySet [{'name': 'Sales', 'head_count': 40}, {'name': 'HR', 'head_count': 34}]> Code language: Python (python)

Behind the scenes, Django uses the HAVING clause to filter the group based on the condition that we pass to the filter() method:

SELECT "hr_department"."name", COUNT("hr_employee"."id") AS "head_count" FROM "hr_department" LEFT OUTER JOIN "hr_employee" ON ("hr_department"."id" = "hr_employee"."department_id") GROUP BY "hr_department"."name" HAVING COUNT("hr_employee"."id") > 30 Code language: Python (python)

Summary #

Was this tutorial helpful ?