Count estimate - PostgreSQL wiki (original) (raw)
count_estimate function
Works with PostgreSQL
Any version
Written in
sql
Depends on
Nothing
Authors: Erwin Brandstetter, Michael Fuhr
Comment by: Emanuel Calvo Franco
The basic SQL standard query to count the rows in a table is:
SELECT count(*) FROM table_name;
This can be rather slow because PostgreSQL has to check visibility for all rows, due to the MVCC model.
If you don't need an exact count, the current statistic from the catalog table pg_class
might be good enough and is much faster to retrieve for big tables.
SELECT reltuples AS estimate FROM pg_class WHERE relname = 'table_name';
estimate
100
Tables named "table_name"
can live in multiple schemas of a database, in which case you get multiple rows for this query. To overcome ambiguity:
SELECT reltuples::bigint AS estimate FROM pg_class WHERE oid = 'schema_name.table_name'::regclass;
The cast to bigint
formats the real
number nicely, especially for big counts.
Quoting the manual for Postgres 13 on pg_class.reltuples:
Number of live rows in the table. This is only an estimate used by the planner. It is updated by VACUUM
, ANALYZE
, and a few DDL commands such as CREATE INDEX
.
If you didn't ANALYZE
recently (after last changes), the estimates will be off more or less.
If you are running the autovacuum daemon as is the default for modern PostgreSQL, ANALYZE
is run automatically, too (except for temporary tables which need manual attention). So the estimates should be good unless you had major changes very recently.
For more sophisticated queries (other than counting all rows from a table), or if you cannot SELECT
from the catalog table pg_class
(which every user can by default), consider the following function, that takes advantage of JSON formatting added in Postgres 9.0. (Older versions can still benefit from plpgsql function by Michael Fuhr.)
⚠️ Do not pass an unsanitized `query` to this function, as it is subject to SQL injection.
CREATE OR REPLACE FUNCTION count_estimate( query text ) RETURNS integer LANGUAGE plpgsql AS DECLAREplanjsonb;BEGINEXECUTEFORMAT(′EXPLAIN(FORMATJSON)RETURNplan−>0−>′Plan′−>′PlanRows′;END;DECLARE plan jsonb; BEGIN EXECUTE FORMAT('EXPLAIN (FORMAT JSON) %s', query) INTO plan; RETURN plan->0->'Plan'->'Plan Rows'; END;DECLAREplanjsonb;BEGINEXECUTEFORMAT(′EXPLAIN(FORMATJSON)RETURNplan−>0−>′Plan′−>′PlanRows′;END;;
Demo:
CREATE TEMP TABLE tbl AS SELECT * FROM generate_series(1, 1000) AS t; ANALYZE tbl;
SELECT count_estimate('SELECT * FROM tbl WHERE t < 100');
count_estimate
100
EXPLAIN SELECT * FROM tbl WHERE t < 100;
QUERY PLAN
Seq Scan on tbl (cost=0.00..35.00 rows=100 width=4) Filter: (t < 100)
As you can see, it's an estimate - actual count would be 99.
Related web resources:
- Source material: Why PostgreSQL Instead of MySQL (also discusses how this is different in MySQL)
- Query alternatives on Stackoverflow: Fast way to discover the row count of a table in PostgreSQL