GitHub - ankane/dexter: The automatic indexer for Postgres (original) (raw)

Dexter

The automatic indexer for Postgres

Read about how it works or watch the talk

Build Status

Installation

First, install HypoPG on your database server. This doesn’t require a restart.

cd /tmp curl -L https://github.com/HypoPG/hypopg/archive/1.4.1.tar.gz | tar xz cd hypopg-1.4.1 make make install # may need sudo

And enable it in databases where you want to use Dexter:

See the installation notes if you run into issues.

Then install the command line tool with:

The command line tool is also available with Docker, Homebrew, or as a Linux package.

How to Use

Dexter needs a connection to your database and a source of queries (like pg_stat_statements) to process.

dexter -d dbname --pg-stat-statements

This finds slow queries and generates output like:

Started
Processing 189 new query fingerprints
Index found: public.genres_movies (genre_id)
Index found: public.genres_movies (movie_id)
Index found: public.movies (title)
Index found: public.ratings (movie_id)
Index found: public.ratings (rating)
Index found: public.ratings (user_id)

To be safe, Dexter will not create indexes unless you pass the --create flag. In this case, you’ll see:

Index found: public.ratings (user_id)
Creating index: CREATE INDEX CONCURRENTLY ON "public"."ratings" ("user_id")
Index created: 15243 ms

Connection Options

Dexter supports the same connection options as psql.

-h host -U user -p 5432 -d dbname

This includes URIs:

postgresql://user:pass@host:5432/dbname

and connection strings:

host=localhost port=5432 dbname=mydb

Always make sure your connection is secure when connecting to a database over a network you don’t fully trust.

Collecting Queries

Dexter can collect queries from a number of sources.

Query Stats

Enable pg_stat_statements in your database.

CREATE EXTENSION pg_stat_statements;

And use:

dexter --pg-stat-statements

Live Queries

Get live queries from the pg_stat_activity view with:

dexter --pg-stat-activity

Log Files

Enable logging for slow queries in your Postgres config file.

log_min_duration_statement = 10 # ms

And use:

dexter postgresql.log

Supports stderr, csvlog, and jsonlog formats.

For real-time indexing, pipe your logfile:

tail -F -n +1 postgresql.log | dexter --stdin

And pass --input-format csvlog or --input-format jsonlog if needed.

SQL Files

Pass a SQL file with:

dexter queries.sql

Pass a single query with:

dexter -s "SELECT * FROM ..."

Collection Options

To prevent one-off queries from being indexed, specify a minimum number of calls before a query is considered for indexing

You can do the same for total time a query has run

dexter --min-time 10 # minutes

When streaming logs, specify the time to wait between processing queries

dexter --interval 60 # seconds

Analyze

For best results, make sure your tables have been recently analyzed so statistics are up-to-date. You can ask Dexter to analyze tables it comes across that haven’t been analyzed in the past hour with:

Tables

You can exclude large or write-heavy tables from indexing with:

dexter --exclude table1,table2

Alternatively, you can specify which tables to index with:

dexter --include table3,table4

Debugging

See how Dexter is processing queries with:

dexter --log-sql --log-level debug2

Hosted Postgres

The hypopg extension, which Dexter needs to run, is available on these providers.

For other providers, see this guide. To request a new extension:

HypoPG Installation Notes

Postgres Location

If your machine has multiple Postgres installations, specify the path to pg_config with:

export PG_CONFIG=/Applications/Postgres.app/Contents/Versions/latest/bin/pg_config

Then re-run the installation instructions (run make clean before make if needed)

Missing Header

If compilation fails with fatal error: postgres.h: No such file or directory, make sure Postgres development files are installed on the server.

For Ubuntu and Debian, use:

sudo apt-get install postgresql-server-dev-15

Note: Replace 15 with your Postgres server version

Additional Installation Methods

Docker

Get the Docker image with:

docker pull ankane/dexter

And run it with:

docker run -ti ankane/dexter

For databases on the host machine, use host.docker.internal as the hostname (on Linux, this requires Docker 20.04+ and --add-host=host.docker.internal:host-gateway).

Homebrew

With Homebrew, you can use:

Future Work

Here are some ideas

Upgrading

Run:

To use master, run:

gem install specific_install gem specific_install https://github.com/ankane/dexter.git

Upgrade Notes

0.5.0

The --stdin option is now required to read queries from stdin.

tail -F -n +1 postgresql.log | dexter --stdin

Thanks

This software wouldn’t be possible without HypoPG, which allows you to create hypothetical indexes, and pg_query, which allows you to parse and fingerprint queries. A big thanks to Dalibo and Lukas Fittl respectively. Also, thanks to YugabyteDB for this article on how to explain normalized queries.

Research

This is known as the Index Selection Problem (ISP).

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development, run:

git clone https://github.com/ankane/dexter.git cd dexter bundle install bundle exec rake install

To run tests, use:

createdb dexter_test bundle exec rake test