How to Use Skip Scans in PostgreSQL 18

PostgreSQL 18 introduces a powerful query optimization technique called skip scan that dramatically improves performance for queries on composite indexes where only the trailing columns are filtered. This feature addresses a long-standing limitation in database query planning, allowing the optimizer to efficiently use multi-column B-tree indexes even when the leading columns aren't specified in your WHERE clause. Skip scans achieve this by intelligently "skipping" through distinct values in the prefix columns, transforming what would have been a full table scan into a much more efficient index-based operation.

The skip scan optimization is particularly valuable in scenarios where you have composite indexes but frequently query only the non-leading columns. Before PostgreSQL 18, such queries would often bypass your carefully crafted indexes entirely, forcing the database to perform expensive sequential scans. PostgreSQL can now leverage these indexes effectively through skip scan, resulting in substantial performance improvements for a wide range of query patterns.

This article will walk you through understanding skip scans in PostgreSQL 18, demonstrating how they work under the hood and when they provide the most benefit. You'll learn how to identify situations where skip scans shine, recognize them in query plans, and understand the trade-offs involved in using this optimization technique.

Prerequisites

Before proceeding with this article, ensure you have PostgreSQL 18 installed on a test system where you can safely experiment:

Copied!

psql --version

Output

psql (PostgreSQL) 18.0 (Ubuntu 18.0-1.pgdg24.04+3)

This guide assumes familiarity with basic PostgreSQL administration including creating databases, running SQL queries, and analyzing query plans. You should also understand fundamental concepts around database indexes and query optimization.

Getting started with skip scans

Start by creating a dedicated test database for experimenting with skip scan functionality. This keeps your test data isolated from production systems and makes cleanup straightforward.

Create a new database specifically for skip scan testing:

Copied!

sudo -u postgres createdb skipscan_test

Connect to the newly created database:

Copied!

sudo -u postgres psql skipscan_test

You should see the PostgreSQL prompt indicating you're connected:

Output

psql (18.0 (Ubuntu 18.0-1.pgdg24.04+3))
Type "help" for help.

skipscan_test=#

With the database connection established, you're ready to create test tables and explore how skip scans optimize query performance.

Understanding the problem skip scans solve

To appreciate what skip scans bring to the table, you need to understand the constraint they address. Traditional B-tree indexes in PostgreSQL work exceptionally well when you filter on the leftmost columns of a composite index, but they become less useful when you only filter on trailing columns.

Consider a composite index created on (status, created_at) for a users table. If you query for users by status, the index works perfectly. However, if you only filter by created_at without specifying status, PostgreSQL historically had two choices: perform a full table scan or scan the entire index. Neither option is ideal when the table contains millions of rows.

Let's set up a practical example to demonstrate this. Within your skipscan_test database, create a sample table with a substantial number of rows:

Copied!

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  status VARCHAR(20),
  created_at TIMESTAMP,
  email VARCHAR(100),
  last_login TIMESTAMP
);

You should see confirmation that the table was created:

Output

CREATE TABLE

Now populate the table with test data representing different user statuses and creation dates:

Copied!

INSERT INTO users (status, created_at, email, last_login)
SELECT 
  CASE (random() * 3)::int
    WHEN 0 THEN 'active'
    WHEN 1 THEN 'inactive'
    ELSE 'pending'
  END,
  TIMESTAMP '2020-01-01' + (random() * (TIMESTAMP '2024-12-31' - TIMESTAMP '2020-01-01')),
  'user' || generate_series || '@example.com',
  TIMESTAMP '2024-01-01' + (random() * (NOW() - TIMESTAMP '2024-01-01'))
FROM generate_series(1, 1000000);

This command will take a moment to execute as it generates one million test records:

Output

INSERT 0 1000000

Next, create a composite index on status and created_at:

Copied!

CREATE INDEX idx_users_status_created ON users (status, created_at);

The index creation will complete with a confirmation message:

Output

CREATE INDEX

Now run ANALYZE to update the table statistics so the query planner can make informed decisions:

Copied!

ANALYZE users;

Output

ANALYZE

With this setup in place, let's examine how PostgreSQL handles different query patterns. Execute a query that filters only on created_at:

Copied!

EXPLAIN ANALYZE
SELECT * FROM users
WHERE created_at > '2024-01-01'
ORDER BY created_at
LIMIT 100;

You'll see output showing the query execution plan and actual runtime statistics:

Output

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19715.25..19726.89 rows=100 width=50) (actual time=79.302..84.572 rows=100.00 loops=1)
   Buffers: shared hit=10384
   ->  Gather Merge  (cost=19715.25..43095.89 rows=200750 width=50) (actual time=79.293..84.549 rows=100.00 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=10384
         ->  Sort  (cost=18715.22..18924.34 rows=83646 width=50) (actual time=67.945..67.961 rows=84.00 loops=3)
               Sort Key: created_at
               Sort Method: top-N heapsort  Memory: 46kB
               Buffers: shared hit=10384
               Worker 0:  Sort Method: top-N heapsort  Memory: 48kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 46kB
               ->  Parallel Seq Scan on users  (cost=0.00..15518.33 rows=83646 width=50) (actual time=0.029..54.915 rows=66399.67 loops=3)
                     Filter: (created_at > '2024-01-01 00:00:00'::timestamp without time zone)
                     Rows Removed by Filter: 266934
                     Buffers: shared hit=10310
 Planning:
   Buffers: shared hit=48 read=1
 Planning Time: 1.608 ms
 Execution Time: 84.698 ms
(20 rows)

This output demonstrates exactly the problem skip scans solve. Notice the query performed a "Parallel Seq Scan" (parallel sequential scan) rather than using the idx_users_status_created index you created. Even though you built a composite index on (status, created_at), PostgreSQL couldn't leverage it effectively because the query only filters on created_at without specifying the leading status column.

The execution took about 85 milliseconds and accessed 10,384 buffer pages. The "Rows Removed by Filter: 266934" line confirms that each of the three parallel workers scanned hundreds of thousands of rows just to find 100 matching records. This is precisely the inefficiency that skip scans eliminate by allowing PostgreSQL to use composite indexes even when leading columns aren't in your WHERE clause.

How skip scans work automatically

PostgreSQL 18's skip scan feature works automatically without any configuration required. The query planner examines your queries and table statistics, then decides whether applying skip scan optimization to a B-tree index scan would be the most efficient execution strategy compared to alternatives like sequential scans or bitmap index scans.

Skip scan optimization works by applying constraints to every column in the index through repeated index searches, even when some columns lack explicit equality constraints in your query. When PostgreSQL detects a query that filters on trailing columns but not the prefix, it generates dynamic equality constraints internally that match every possible value in the leading column. This transforms a single query into multiple targeted index searches.

The process works like this: PostgreSQL first identifies all unique values in the prefix column (in our case, status). For each distinct value, it performs a targeted index scan as if you had explicitly filtered on that value. A query like WHERE created_at > '2024-01-01' effectively becomes multiple searches: WHERE status = 'active' AND created_at > '2024-01-01', WHERE status = 'inactive' AND created_at > '2024-01-01', and so on. Each search repositions within the index to skip directly to the relevant section.

The key insight is that this approach becomes efficient when the leading column has low cardinality (few distinct values). If status only has three possible values, the database performs just three index searches instead of scanning the entire table or index. Each individual search is highly selective and fast because it can use the full composite index effectively, navigating directly to matching tuples while skipping over irrelevant index sections.

Skip scans are most effective when you have: - A composite index with a low-cardinality leading column - Queries that filter on trailing columns but not the prefix - A substantial number of rows where full table scans are prohibitively expensive - So few distinct leading column values that the planner expects to skip over most of the index

To verify whether your query is using a skip scan, examine the query plan using EXPLAIN:

Copied!

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users
WHERE created_at > '2024-01-01'
ORDER BY created_at
LIMIT 100;

When skip scan is used, look for "Index Skip Scan" in the output. However, in this case, you'll see that PostgreSQL is still using a parallel sequential scan:

Output

                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=19715.25..19726.89 rows=100 width=50) (actual time=87.391..92.365 rows=100 loops=1)
   Buffers: shared hit=10384
   ->  Gather Merge  (cost=19715.25..43095.89 rows=200750 width=50) (actual time=87.382..92.345 rows=100 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=10384
         ->  Sort  (cost=18715.22..18924.34 rows=83646 width=50) (actual time=70.001..70.017 rows=85 loops=3)
               Sort Key: created_at
               Sort Method: top-N heapsort  Memory: 46kB
               Buffers: shared hit=10384
               Worker 0:  Sort Method: top-N heapsort  Memory: 47kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 47kB
               ->  Parallel Seq Scan on users  (cost=0.00..15518.33 rows=83646 width=50) (actual time=0.052..56.667 rows=66400 loops=3)
                     Filter: (created_at > '2024-01-01 00:00:00'::timestamp without time zone)
                     Rows Removed by Filter: 266934
                     Buffers: shared hit=10310
 Planning Time: 1.075 ms
 Execution Time: 93.641 ms
(18 rows)

Notice the planner chose a "Parallel Seq Scan" rather than an "Index Skip Scan" for this query. The execution took about 94 milliseconds and accessed 10,384 buffer pages. This demonstrates that skip scan isn't always selected, even when a composite index exists on (status, created_at).

The planner's decision depends on several factors including table statistics, data distribution, and estimated costs. In this case, PostgreSQL determined that a parallel sequential scan across three workers would be more efficient than skip scan. This might be because the selectivity of the filter (rows after '2024-01-01') is high enough that scanning in parallel is competitive with the skip scan approach.

To encourage the planner to consider skip scan, you might need different data characteristics (such as fewer matching rows), more distinct values in the leading column, or you could try adjusting the query to be more selective. The planner's automatic decision-making means it will choose skip scan only when cost estimates indicate it's the fastest approach for your specific data distribution.

Triggering skip scan with selective queries

The previous query didn't trigger skip scan because the filter on created_at > '2024-01-01' matched too many rows (about 200,000 out of 1 million). Skip scan becomes more attractive to the planner when queries are highly selective and return only a small percentage of total rows.

Let's try a more selective query that filters for recent data:

Copied!

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users
WHERE created_at > '2024-12-01'
ORDER BY created_at
LIMIT 100;

This query targets only the last month of data in our dataset:

Output

                                                                 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=11987.34..11987.59 rows=100 width=50) (actual time=25.281..25.306 rows=100 loops=1)
   Buffers: shared hit=8233 read=72
   ->  Sort  (cost=11987.34..12029.06 rows=16687 width=50) (actual time=25.273..25.288 rows=100 loops=1)
         Sort Key: created_at
         Sort Method: top-N heapsort  Memory: 48kB
         Buffers: shared hit=8233 read=72
         ->  Bitmap Heap Scan on users  (cost=419.03..11349.57 rows=16687 width=50) (actual time=9.231..22.727 rows=16316 loops=1)
               Recheck Cond: (created_at > '2024-12-01 00:00:00'::timestamp without time zone)
               Heap Blocks: exact=8221
               Buffers: shared hit=8233 read=72
               ->  Bitmap Index Scan on idx_users_status_created  (cost=0.00..414.86 rows=16687 width=0) (actual time=6.785..6.786 rows=16316 loops=1)
                     Index Cond: (created_at > '2024-12-01 00:00:00'::timestamp without time zone)
                     Index Searches: 4
                     Buffers: shared hit=12 read=72
 Planning Time: 0.377 ms
 Execution Time: 25.390 ms
(16 rows)

Even with a more selective query matching only 16,316 rows (about 1.6% of the table), PostgreSQL chose a bitmap index scan instead of skip scan. The key detail here is "Index Searches: 4", indicating the planner performed 4 separate index searches - likely one for each distinct value in the status column. This represents skip scan-like behavior implemented through bitmap scanning rather than a dedicated Index Skip Scan node.

The execution time of 25.4 milliseconds is noticeably faster than the 94 milliseconds we saw with the parallel sequential scan earlier. The bitmap approach allows PostgreSQL to scan the index for each status value, build an in-memory bitmap of matching row locations, then fetch those rows in physical order for better I/O efficiency.

Let's try an even more selective query with a narrower time window:

Copied!

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users
WHERE created_at > CURRENT_DATE - INTERVAL '7 days'
ORDER BY created_at
LIMIT 100;

This targets only the last week of data:

Output

                                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=389.72..389.97 rows=100 width=50) (actual time=0.155..0.156 rows=0 loops=1)
   Buffers: shared hit=12
   ->  Sort  (cost=389.72..389.97 rows=100 width=50) (actual time=0.153..0.154 rows=0 loops=1)
         Sort Key: created_at
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=12
         ->  Index Scan using idx_users_status_created on users  (cost=0.43..386.40 rows=100 width=50) (actual time=0.145..0.146 rows=0 loops=1)
               Index Cond: (created_at > (CURRENT_DATE - '7 days'::interval))
               Index Searches: 4
               Buffers: shared hit=12
 Planning Time: 0.221 ms
 Execution Time: 0.204 ms
(12 rows)

Now we see a regular "Index Scan using idx_users_status_created" with the telltale sign: "Index Searches: 4". This metric reveals that PostgreSQL performed 4 separate index probes, one for each distinct status value in the table. Although the plan doesn't explicitly show a node type called "Index Skip Scan", the "Index Searches: 4" metric demonstrates that the skip scan optimization is being applied to this B-tree index scan.

The query returned zero rows because our test data doesn't include records from the last seven days, but the execution plan shows exactly how skip scan works. The fast execution time of 0.204 milliseconds and minimal buffer usage (just 12 pages) confirm the efficiency of this approach compared to scanning the entire table or index.

The important insight here is that PostgreSQL 18's skip scan is an optimization applied to B-tree index scans rather than always appearing as a distinct node type. You might see it show up as "Index Skip Scan" explicitly in some plans, as a regular "Index Scan" with multiple index searches, or as a "Bitmap Index Scan" that probes the index multiple times. The reliable indicator is the "Index Searches: N" metric in the execution plan, which shows how many times PostgreSQL jumped to different positions in the index to find matching rows - this confirms skip scan optimization is active.

Understanding when skip scan is NOT triggered

While skip scan is a powerful optimization, it's not always the planner's choice even when composite indexes exist. Understanding why the planner chooses alternative strategies helps you design better indexes and set realistic expectations for query performance.

Our first query with created_at > '2024-01-01' demonstrated a common scenario where skip scan isn't selected. The filter matched roughly 200,000 rows out of 1 million - about 20% of the table. When queries return such a large percentage of rows, the planner often favors parallel sequential scans because they can efficiently process large result sets by dividing work across multiple CPU cores.

The cost model considers several factors when deciding against skip scan:

Query selectivity matters significantly. Skip scan excels at finding small needles in large haystacks, but becomes less attractive as the result set grows. If your query returns 20% of a table, performing repeated index searches for every distinct leading column value may cost more than simply scanning through the data in parallel. The planner compares the estimated number of index probes against the cost of reading data sequentially, and sequential access often wins for large result sets.

Leading column cardinality affects the decision. Our users table has only three distinct status values (active, inactive, pending), which makes skip scan viable when other conditions are favorable. If the leading column had hundreds or thousands of distinct values, skip scan would require that many separate index searches. Each search has overhead - positioning in the index, traversing B-tree pages, maintaining scan state - and this overhead accumulates quickly with high cardinality.

Consider what happens with a high-cardinality leading column. Create a new index with id (unique values) as the leading column:

Copied!

CREATE INDEX idx_users_id_created ON users (id, created_at);

Output

CREATE INDEX

Now try querying on just the trailing column:

Copied!

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM users
WHERE created_at > '2024-12-01'
ORDER BY created_at
LIMIT 100;

The planner will almost certainly avoid skip scan for this index because performing one million separate index searches (one per distinct id value) would be far more expensive than alternative strategies. This is why placing low-cardinality columns at the beginning of composite indexes is crucial when you plan to query on trailing columns.

Alternative strategies may simply be cheaper. The bitmap index scan we saw for the created_at > '2024-12-01' query represents the planner's judgment that building an in-memory bitmap and fetching rows in physical order provides better performance than skip scan for that specific data distribution. Bitmap scans excel at combining multiple index conditions and accessing scattered data efficiently, especially when the result set is moderately sized.

Table size influences the decision too. On small tables with just a few thousand rows, sequential scans are often fastest regardless of available indexes. The overhead of index traversal - reading index pages, following pointers, checking visibility - can exceed the cost of simply reading all rows when tables fit comfortably in memory. PostgreSQL's cost model accounts for this, which is why you might see sequential scans on small tables even with perfect indexes.

Outdated statistics mislead the planner. The cost-based optimizer relies on table statistics to estimate row counts, data distribution, and correlation between columns. If these statistics are stale - perhaps because you loaded data recently without running ANALYZE - the planner might estimate costs incorrectly and choose suboptimal plans. Always run ANALYZE after significant data changes:

Copied!

ANALYZE users;

Output

ANALYZE

Memory and I/O patterns matter. If most of your data is already in PostgreSQL's shared buffers or the operating system's page cache, sequential scans become much more attractive because they read data with excellent spatial locality. Skip scan might jump around the index and heap, potentially causing more cache misses. The planner considers the effective_cache_size parameter when estimating these costs, though it can't know exactly what's currently in cache.

The key takeaway is that skip scan is one tool among many in PostgreSQL's optimization toolkit. The planner automatically selects the strategy it believes will execute fastest based on available information. When skip scan isn't chosen, it's usually because the planner has good reasons to prefer alternatives like parallel sequential scans, bitmap scans, or regular index scans. Understanding these trade-offs helps you design schemas and queries that work with the planner's cost model rather than against it.

Skip scan limitations

Skip scan in PostgreSQL 18 works specifically with B-tree indexes and has particular requirements for optimal effectiveness:

B-tree only - Skip scan applies exclusively to B-tree indexes, though this covers the majority of index use cases since B-tree is PostgreSQL's default and most common index type. Other index types like GiST, GIN, or BRIN don't support skip scan optimization.

Low cardinality preference - The optimization works best when omitted leading columns have relatively few distinct values. Performance benefits diminish significantly as cardinality increases in the prefix columns. If the leading column has thousands or millions of distinct values, skip scan becomes inefficient because it must perform that many separate index searches.

Trailing column conditions required - Skip scan requires at least one useful constraint on a later indexed column. Without conditions on trailing columns, there's nothing for the skip scan to optimize toward. The planner needs both omitted prefix columns and filtered trailing columns to apply this optimization.

Equality or range conditions - The feature works with both equality conditions (like category = 'Electronics') and range conditions (like sale_date > '2025-01-01') on trailing columns. However, the selectivity of these conditions significantly impacts whether skip scan is chosen over alternatives.

Cost-based decision making - PostgreSQL's query planner automatically determines when skip scan is worthwhile based on table statistics and cost estimates. You cannot manually force skip scan usage - the planner chooses the most efficient approach for each query based on estimated costs.

Understanding these limitations helps you design indexes and queries that maximize skip scan benefits while avoiding scenarios where it provides little value.

Final thoughts

In this article, we've explored skip scans in PostgreSQL 18 and demonstrated how this optimization technique improves query performance when filtering on trailing columns of composite B-tree indexes. We've covered how skip scans work through repeated index searches, when they provide the most value, and how to recognize them through the "Index Searches: N" metric in execution plans.

Skip scans represent a significant advancement in PostgreSQL's query optimization capabilities, enabling more efficient use of existing indexes without requiring manual configuration. The key to leveraging them effectively lies in understanding your query patterns and using composite indexes with low-cardinality leading columns.

This tutorial can't cover every nuance of skip scan optimization and its interaction with PostgreSQL's broader query planning ecosystem. For more comprehensive information, consult the official PostgreSQL documentation and stay current with release notes as the feature continues to evolve.

Thanks for reading, and happy querying!

Got an article suggestion? Let us know

Using OLD and NEW in RETURNING Clauses in PostgreSQL 18

Learn how PostgreSQL 18's enhanced RETURNING clause uses OLD and NEW references to track data changes in INSERT, UPDATE, and DELETE operations with examples.

→