Performance since PostgreSQL 7.4 / fulltext


2015/03/17 by Tomas Vondra

After discussing the pgbench and TPC-DS results, it's time to look at the last benchmark, testing performance of built-in fulltext (and GIN/GiST index implementation in general).

The one chart you should remember from this post is this one, GIN speedup between 9.3 and 9.4:

fulltext-timing-speedups.png

Interpreting this chart is a bit tricky - x-axis tracks duration on PostgreSQL 9.3 (log scale), while y-axis (linear scale) tracks relative speedup 9.4 vs. 9.3, so 1.0 means 'equal performance', and 0.5 means that 9.4 is 2x faster than 9.3.

The chart pretty much shows exponential speedup for vast majority of queries - the longer the duration on 9.3, the higher the speedup on 9.4. That's pretty awesome, IMNSHO. What exactly caused that will be discussed later (spoiler: it's thanks to GIN fastscan). Also notice that almost no queries are slower on 9.4, and those few examples are not significantly slower.


Benchmark

While both pgbench and TPC-DS are well established benchmarks, there's no such benchmark for testing fulltext performance (as far as I know). Luckily, I've had played with the fulltext features a while ago, implementing archie - an in-database mailing list archive.

It's still quite experimental and I use it for testing GIN/GiST related patches, but it's suitable for this benchmark too.

So I've taken the current archives of PostgreSQL mailing lists, containing about 1 million messages, loaded them into the database and then executed 33k real-world queries collected from postgresql.org. I can't publish those queries because of privacy concerns (there's no info on users, but still ...), but the queries look like this:

SELECT id FROM messages
 WHERE body_tsvector @@ ('optimizing & bulk & update')::tsquery
 ORDER BY ts_rank(body_tsvector, ('optimizing & bulk & update')::tsquery)
          DESC LIMIT 100;

The number of search terms varies quite a bit - the simplest queries have a single letter, the most complex ones often tens of words.

PostgreSQL config

The PostgreSQL configuration was mostly default, with only minor changes:

shared_buffers = 512MB
work_mem = 64MB
maintenance_work_mem = 128MB
checkpoint_segments = 32
effective_cache_size = 4GB

Loading the data

We have to load the data first, of course. In this case that involves a fair amount of additional logic implemented either in Python (parsing the mbox files into messages, loading them into the database), or PL/pgSQL triggers (thread detection, ...). The time needed to load all the 1M messages, producing ~6GB database, looks like this:

fulltext-load.png

Note: The chart only shows releases where the performance changed, so if only data for 8.2 and 9.4 are shown, it means that the releases up until 9.3 behave like 8.2 (more or less).

The common wisdom is that querying GIN indexes are faster than GiST, but that they are more expensive when it comes to maintenance (creation, etc).

If you look at PostgreSQL 8.2, the oldest release supporting GIN indexes, that certainly was true - the load took ~1300 seconds with GIN indexes and only ~800 seconds with GiST indexes. But 8.4 significantly impoved this, making the GIN indexes only slightly more expensive than GiST.

Of course, this is incremental load - it might look very differently if the indexes were created after all the data are loaded, for example. But I argue that the incremental performance is more important here, because that's what usually matters in actual applications.

The other argument might be that the overhead of the Python parser and PL/pgSQL triggers is overshadowing the GIN / GiST difference. That may be true, but that overhead should be about the same for both index types, so read-world applications are likely to have similar overhead.

So I believe that GIN maintenance is not significantly more expensive than GiST - at least in this particular benchmark, but probably in other applications too. I have no doubt it's possible to construct examples where GIN maintenance is much more expensive than GiST maintenance.

Query performance

The one thing that's missing in the previous section is query performance. Let's assume your workload is 90% reads, and GIN is 10x faster than GiST for the queries you do - how much you care if GIN maintenance is 10x more expensive than GiST, in that case? In most cases, you'll choose GIN indexes because that'll probably give you better performance overall. (It's more complicated, of course, but I'll ignore that here.)

So, how did the GIN and GiST performance evolved over time? GiST indexes were introduced first - in PostgreSQL 8.0 as a contrib module (aka extension in new releases), and then in core PostgreSQL 8.3. Using the 33k queries, the time to run all of them on each release is this (i.e. lower values are better):

fulltext-gist.png

Interesting. It took only ~3200 seconds on PostgreSQL 8.0 - 8.2, and then it slowed down to ~5200 seconds. That may be seen as a regression, but my understanding is that this is the cost of move into core - the contrib module was probably limited in various ways, and proper integration with the rest of the core required fixing these shortcomings.

What about GIN? This feature was introduced in PostgreSQL 8.2, directly as in-core feature (so not as contrib module first).

fulltext-gin.png

Interestingly it was gradually slowing down a bit (by about ~15% between 8.2 and 9.3) - I take it as a sign that we really need regular benchmarking as part of development. Then, on 9.4 the performance significantly improved, thanks to this change:

  • Improve speed of multi-key GIN lookups (Alexander Korotkov, Heikki Linnakangas)

also known as "GIN fastscan".

I was discussing GIN vs. GiST maincenance cost vs. query performance a few paragraphs back, so what is the performance difference between GIN and GiST?

fulltext-gist-vs-gin.png

Well, in this particular benchmark, GIN indexes are about 10x faster than GiST (would be sad otherwise, because fulltext is the primary use of GIN), and as we've seen before it was not much slower than GiST maintenance-wise.

GIN fastscan

So what is the GIN fastscan about? I'll try to explain this, although it's of course a significantly simplified explanation.

GIN indexes are used for indexing non-scalar data - for example when it comes to fulltext, each document (stored in a TEXT column as a single value) is transformed into tsvector, a list of words in the document (along with some other data, but that's irrelevant here). For example let's assume document with ID=10 contains the popular sentence

10 => "The quick brown fox jumps over the lazy dog"

This will get split into an array of words (this transformation may even remove some words, perform lemmatization):

10 => ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]

If you build GIN index on this "vector" representation, the index will effectively invert the direction of the mapping by mapping words to IDs of all the rows containing that word (each row ID is a pair of block number and offset on that block):

"The" => [(0,1), (0,10), (0,15), (2,4), ...]
"quick" => [(0,1), (0,2), (2,10), (2,15), (3,18), ...]
"brown" => [(1,10), (1,11), (1,12), ...]
...

Then, if you do a fulltext query on the document, say

SELECT * FROM documents WHERE to_tsvector(body) @@ to_tsquery('quick & fox');

it can simply fetch the lists for quick and fox and combine them, to get only IDs of the documents containing both words.

And this is exactly where GIN fastscan was applied. Until PostgreSQL 9.4, the performance of this combination step was determined by the longest list of document IDs, because it had to be walked. So if you had a query combining rare and common words (included in many documents, thus having a long lists of IDs), it was often slow.

GIN fastscan changes this, starting with the short posting lists, and combining the lists in a smart way (by using the fact that the lists of IDs are sorted), so that the duration is determined by the shortest list of IDs.

How much impact can this have? Let's see!

Compression

The fastscan is not the only improvement in 9.4 - the other significant improvement is compression of the posting lists (lists of row IDs). If you look at the previous example, you might notice that the posting list can be made quite compressible - you may sort the row IDs (first by block number, then by row offset). The block numbers will then repeat a lot, and the row offsets will be an increasing sequence.

This redundancy may be exploited by various encoding schemes - RLE, delta, ... and that's what was done in PostgreSQL 9.4. The result is that GIN indexes are often much smaller. How much smaller really depends on the dataset, but for the dataset used in this benchmark the size dropped to 50% - from ~630MB to ~330MB. Other developers reported up to 80% savings in some cases.

Relative speedup

The following chart (already presented at the beginning of this blog post) presents speedup of a random sample from the 33k queries (plotting all the queries would only make it less readable). It shows relative speedup depending on the duration on PostgreSQL 9.3, i.e. each points plots

  • x-axis (log-scale) - duration on PostgreSQL 9.3
  • y-axis (linear) - (duration on PostgreSQL 9.4) / (duration on PostgreSQL 9.3)

So if the query took 100 ms on PostgreSQL 9.3, and only takes 10 ms on PostgreSQL 9.4, this is represented by a point [100, 0.1].

fulltext-timing-speedups.png

There are a few interesting observations:

  • Only very few queries slowed down on PostgreSQL 9.4. Those queries are either very fast, taking less than 1ms, with a slow-down less than 1.6 (this may easily be a noise) or longer but with slowdown well below 10% (again, may be a noise).
  • Vast majority of queries is significantly faster than on PostgreSQL 9.3, which is clearly visible as an area with high density of the blue dots. The most interesting thing is that the higher the PostgreSQL 9.3 duration, the higher the speedup.

This is perfectly consistent with the GIN fastscan - the queries that combine frequent and rare words took time proportional to the frequent word on PostgreSQL 9.3, but thanks to fastscan the performance is determined by the rare words. Hence the exponential speedup.

Fulltext dictionaries

While I'm quite excited about the speedup, the actual performance depends on other things too - for example what dictionary you use. In this benchmark I've been using the english dictionary, based on a simple snowball stemmer - a simple algorithmic stemmer, not using any kind of dictionary.

If you're using a more complicated configuration - for example a dictionary-based stemmer, because that's necessary for your language, this may take quite a significant amount of time (especially if you're not using connection pooling and so the dictionaries need to be parsed over and over again - my shared_ispell project might be interesting in this case).

GIN indexes as bitmap indexes

PostgreSQL does not have traditional bitmap indexes, i.e. indexes serialized into simple on-disk bitmaps. There were attempts to do that feature in the past, but the gains never really outweighter the performance issues (locking and such), especially since 8.2 when bitmap index scans were implemented (i.e. construction of bitmaps from btree indexes at runtime).

But if you think about that, GIN indexes are really bitmap indexes, with different bitmap serialiation format. If you're craving for bitmap indexes (not uncommon in analytical workloads), you might try btree_gin extension which makes it possible to create GIN indexes on scalar types (by default GIN can be built only on vector-like types - tsvector and such).

Summary

  • The wisdom "GIN indexes are faster to query but more expensive to maintain" may not be true anymore, especially if the query performance is more important for you.
  • Load performance improved a lot, especially in PostgreSQL 8.2 (GiST) and 8.4 (GIN).
  • Query performance for GiST is mostly the same (at least since PostgreSQL 8.3 when GiST was included into core).
  • For GIN, the query performance was mostly the same until PostgreSQL 9.4, when the "fastscan" significantly improved performance of queries combining rare and frequent keys.




comments powered by Disqus