Powering next gen AI apps with Postgres 🚀 Learn More
Postgres

Postgres Support Recap: Investigating Postgres Query Performance

Where to begin when your Postgres queries start to perform poorly

Post image

Neon Support often receives support tickets related to Postgres query performance. Such issues could result from a myriad of factors ranging from missing indexes or lack of database maintenance to ineffective queries and joins or system resource limitations. Where should you start when trying to get to the bottom of an issue with such a wide range of potential causes? In this Postgres Support Recap, we outline three strategies we often recommend and ones you can use in your Postgres query performance investigations.

Specifically, we’ll take a look at these strategies:

For those who prefer a visual monitoring tool over running SQL queries, we’ll finish off this post by introducing a free and open-source Postgres query monitoring and analysis application called PgHero.

Without further ado, let’s get started! 

Strategy 1: Analyzing query performance with pg_stat_statements

pg_stat_statements is an open-source Postgres extension for monitoring and analyzing SQL query performance. It provides aggregated query statistics for executed SQL statements. The data collected includes the number of query executions, total execution time, rows returned by the query, and much more. 

This extension isn’t installed by default, so your first step is to install it and allow some time for data collection. To install the extension, you can run the following CREATE EXTENSION statement in a Postgres client such as psql that is connected to your database.

Once installed, you can run the following query to view the types of data that pg_stat_statements `collects:

For a description of each metric, you can refer to the official Postgres documentation: The pg_stat_statements View.

After allowing time for statistics to be collected, you can run queries like this one to start gathering data about your queries.

Long-running queries

This pg_stat_statements query returns the longest-running queries by mean execution time.

Long-running queries are candidates for optimization. As a next step, you can run EXPLAIN (ANALYZE) on each to identify opportunities for optimization, such as full table scans or inefficient joins.

There are many other useful queries you can run with pg_stat_statements, which you can find online and in our pg_stat_statements guide

Strategy 2: Checking your cache hit ratio

A cache hit ratio tells you what percentage of queries are served from memory. Queries not served from memory retrieve data from disk, which is more costly and can result in slower query performance.

In Postgres, you can query the cache hit ratio with an SQL statement similar to this one, which looks for shared buffer block hits.

In Neon, it’s a little different. Neon extends Postgres shared buffers with a local file cache (local to your Neon compute instance), so in order to query your cache hit ratio in Neon, you need to look at local file cache hits instead of shared buffer hits. 

The neon_stat_file_cache view

To enable querying local file cache statistics, Neon provides a neon_stat_file_cache view. To access this view, you must first install the neon extension:

After allowing time for statistics collection, you can issue the following query to view your cache hit ratio:

The ratio is calculated according to the following formula:

If the file_cache_hit_ratio is below 99%, your working set (your most frequently accessed data) may not be adequately in memory. This could be due to your Postgres instance not having sufficient memory.

If it’s a lack of memory, you can consider allocating more. In Postgres, this requires increasing your shared_buffers setting, assuming your system has memory resources to support it. In Neon, shared_buffers is always set to 128 MB. To increase available memory in Neon, you can increase the size of your compute. Larger computes have larger local file caches. For information about selecting an appropriate compute size in Neon, please refer to How to size your compute.

Please remember that the local file cache statistics are for the entire compute, not specific databases or tables. A Neon compute runs an instance of Postgres, which can have multiple databases and tables.

Strategy 3: Checking for table or index bloat

If there is some issue with Postgres autovacuum, this can lead to table and index bloat. 

Bloat refers to the condition where tables and indexes occupy more space on disk than is necessary for storing the data. Bloat can occur over time due to the way Postgres handles updates and deletes.

Table Bloat

When a row is updated, the database doesn’t overwrite the existing row. Instead, it marks the old row version as obsolete and creates a new version of the row elsewhere in the table. Similarly, when a row is deleted, it is not immediately removed; it’s just marked as inaccessible. The space occupied by these obsolete or deleted rows contributes to table bloat.

This mechanism supports Postgres MVCC (Multi-Version Concurrency Control), allowing for more efficient query processing without locking rows for reading. However, the downside is that it can lead to wasted space and decreased performance over time as the table grows larger than necessary.

Index Bloat

Indexes can also experience bloat. As rows are updated and deleted, the indexes that point to those rows can become inefficient. Index bloat happens because, similar to tables, indexes also retain pointers to obsolete row versions. Over time, the index can grow larger, consuming more space than necessary.

Index bloat can degrade the performance of read operations. Since indexes are used to speed up data retrieval, a bloated index can have the opposite effect, making queries slower.

Checking for bloat

There are SQL queries you can run to check for table and index bloat. There are several good sources for bloat check queries, including these:

Reducing bloat

To reduce table bloat, you can run the VACUUM command. VACUUM cleans up these obsolete records and makes space available for reuse within the table. For more aggressive space reclamation, you can use VACUUM FULL, but this command locks the table, which can be disruptive.

To remove index bloat, you can use the REINDEX command, which rebuilds the index from scratch. Be aware that this can be an intensive operation, especially for large indexes, as it requires an exclusive lock on the index.

Generally, you’ll want to perform these types of operations when it will have the least impact, or you’ll want to plan some maintenance downtime for a clean-up. 

PgHero: A Performance Dashboard for Postgres

PgHero is described as a performance dashboard for Postgres, and for anyone looking for a free and open-source monitoring tool, it’s a good alternative to running SQL queries from the command line.

What does PgHero provide?

A quick look at the interface gives you an idea of what you’ll find in PgHero.

Post image

Among other things, you can use PgHero to:

  • Identify long-running queries
  • Identify tables that require vacuuming
  • Identify duplicate or missing indexes
  • View connections by database and user
  • Explain, analyze, and visualize queries

How to install PgHero

PgHero supports installation with Docker, Linux, and Rails. Here, we’ll show how to install PgHero with Docker and connect it to a Neon database. 

Before you begin:

  • Ensure that you have the pg_stat_statements extension installed. PgHero uses it for query stats. See above.
  • Ensure that you have Docker installed. See Install Docker Engine for instructions.

PgHero is available on DockerHub. To install it, run:

Next, grab your Neon database connection string from the Connection Details widget in the Neon Dashboard.

Post image

Finally, run this command, replacing $NEON_DB with your Neon database connection string.

Then visit http://localhost:8080 in your browser to open the PgHero Dashboard.

Conclusion

In summary, managing and troubleshooting Postgres query performance can involve a variety of strategies. We’ve covered a few of them in this post, including using the pg_stat_statements extension to analyze query performance, evaluating your cache hit ratio, addressing table and index bloat, and utilizing tools like PgHero. We hope you find these strategies helpful in your own query performance investigations.
As always, thanks for reading and stay tuned for the next Postgres Support Recap!