About a week ago, I posted comparison of pgbench results since PostgreSQL 7.4, which was one of the benchmarks done for my pgconf.eu 2014 talk. For an explanation of the whole quest, please see the first post.
Now it's time to discuss results of the second benchmark - TPC-DS, which models an analytical workload, i.e. queries processing large amounts of data, with aggregations, large sorts, joins of large tables, TOP-N queries etc. That is very different from pgbench, executing queries manipulating individual rows mostly through primary keys, etc.
The one chart you should remember from this post is this one, illustrating how long it takes to execute 41 queries (subset of TPC-DS queries compatible with PostgreSQL since 7.4) on a 16GB dataset (raw CSV size, after loading into database it occupies about 5x the size because of overhead, indexes etc.).
The numbers are runtime in seconds (on the i5-2500k machine), and apparently while on PostgreSQL 8.0 it took ~5100 seconds (85 minutes), on 9.4 it takes only ~1200 seconds (20 minutes). That's a huge improvement.
Notice the 8.0 results are marked with a star - that's because on 8.0 one of the queries did not complete within an arbitrary limit of 30 minutes, so it was cancelled and was counted as taking 1h. Based on several experiments, I believe the actual runtime would be even longer than that - in any case it was much longer than on PostgreSQL 8.1, where this particular query got significantly improved by bitmap index scans.
Considering that this is quite I/O intensive (the database size is ~5x the RAM), that's a huge improvement. As we'll see later, with smaller datasets (that completely fit into RAM), the speedup is even larger.
BTW I've had trouble making this work on PostgreSQL 7.4 (without making the results difficult to compare), so I'll only present results for PostgreSQL 8.0 and newer releases.
But let's talk a bit about the TPC-DS benchmark, because the brief description in the introduction is not really detailed enough. TPC-DS is another benchmark from TPC. It's representing analytical workloads (reporting, data analysis, decision support systems and so on), so the queries are processing large amounts of data, performing aggregations (GROUP BY), various joins, etc.
It effectively extends and deprecates TPC-H benchmark, improving it in multiple ways to make it more representative of actual workloads. Firstly, it makes the schema more complex (e.g. more tables), and uses less uniform distributions of the data (which makes cardinality estimations way more difficult). It also increases the number of query templates from 22 to 99, and uses modern features like CTEs, window functions and grouping sets.
Of course, presenting all the 99 query templates here would be pointless, but one of the simpler ones looks like this:
select ca_zip ,sum(cs_sales_price) from catalog_sales ,customer ,customer_address ,date_dim where cs_bill_customer_sk = c_customer_sk and c_current_addr_sk = ca_address_sk and ( substr(ca_zip,1,5) in ('85669', '86197','88274','83405','86475', '85392', '85460', '80348', '81792') or ca_state in ('CA','WA','GA') or cs_sales_price > 500) and cs_sold_date_sk = d_date_sk and d_qoy = 2 and d_year = 2000 group by ca_zip order by ca_zip limit 100;
This particular query joins 4 tables, uses non-trivial WHERE conditions, aggregation and finally selects only results for the first 100 ZIP codes. The other templates are often way more complex.
Some of the templates are incompatible with PostgreSQL, because they rely on not-yet-implemented features (e.g. CUBE/ROLLUP). Some of the templates also seem broken, as the query generator fails on them. There are 61 queries working fine since PostgreSQL 8.4 (when CTEs and window functions were added to PostgreSQL), and 41 queries are compatible with versions since 7.4. And those 41 queries were used for this benchmark.
Note 1: Most of the remaining queries may be rewritten to make them work, but I haven't done that. Those queries were designed specifically to test those features, and the rewritten versions would benchmark the same things as the remaining queries anyway. I'm pretty sure it's possible to rewrite some of the "compatible" queries to get better performance, but I haven't done that for the same reason.
Note 2: The TPC-DS benchmark also includes tests for maintenance/management tasks (e.g.
ALTER TABLE, various DDL etc.) but I have not performed this part.
As with all benchmarks, dataset sizes really matter. I've used two sizes 1GB and 16GB, specified as the size of the CSV filed generated by the TPC tooling.
small / 1GB
large / 16GB
I haven't really spent too much time optimizing the schema - I simply used the schema provided with the TPC-DS suite, and after a few rounds of benchmarking created suitable indexes. I have used the same schema for all PostgreSQL versions, which probably discriminates the newer versions, as those might benefit from tweaking indexes to be suitable for index-only scans, for example.
The the PostgreSQL configuration was mostly default, with only minimal changes. All the versions used the same configuration:
shared_buffers = 512MB work_mem = 128MB maintenance_work_mem = 128MB effective_cache_size = 1GB checkpoint_segments = 32
Similarly to pgbench we could probably get a bit better performance by tuning the values for each version. The values used here are quite conservative already, so don't expect an effect similar to pgbench, when lowering the values resulted in significant speedup on older versions.
If you want to review the tooling I used, it's available here. It's a bit hackish (mostly a bunch of shell scripts) and certainly is not "ready to run" in various ways - it does not include the data and query generators (you can get them at tpc.org, and you'll have to modify a few paths in the scripts (to data directory etc.), but it shouldn't be difficult. Let me know by e-mail if you run into problems.
This also includes DDL (schema including indexes), PostgreSQL config files, query templates and actual queries used for the benchmark.
The first thing you often need to do is loading the data into database. The load process is very simple:
and the results for a 1GB dataset (raw size of CSV files) look like this:
Clearly, something changed in 9.0, because the VACUUM FULL step takes much longer, and indeed - that's when VACUUM FULL was rewritten to use the same implementation as CLUSTER, i.e. completely rewriting the table. On older releases it used a different approach, that was more efficient for tables with only small fraction of dead tuples (which is the case here, right after a load), but much less efficient on tables with a large portion of dead tuples (which is exactly when you want to do a CLUSTER).
That means the VACUUM FULL is rather misleading here, because it's used in exactly the context where you should not use that (and instead let autovacuum do it's job), so let's remove it from the chart.
Much better, I guess. While it took ~1000 seconds on 8.0, on 9.4 it only takes ~500 seconds - not bad, I guess. The main improvement happened in 8.1, and 8.2 was a minor regression, followed by small incremental improvements.
Let's see the performance with a larger (16GB) dataset:
Again, about 2x the speedup between 8.0 and 9.4 (just like with the small dataset), but the pattern is slightly different - both 8.1 and 8.2 improved the performance about equally (no regression in 8.2), followed by releases keeping a stable performance.
So far I've been talking about "raw size" of the datasets, i.e. size of the CSV files produced by the TPC-DS generator. But what that means for the database size? After the small (1GB) dataset, you'll get about this:
That's ~5-6GB databases - PostgreSQL 9.4 needs about 15% less compared to 8.0, which is certainly nice. About 60% of that are indexes, leaving ~2.5GB for the tables. By applying this to the 16GB dataset, it will require ~40GB on disk for the table, and additional ~60GB for the indexes.
And finally, the query performance. On the small dataset (which fits into memory), the 41 looks like this on average:
and on the large dataset, you'll get this:
Clearly, PostgreSQL 8.1 was a significant improvement. It's also possible to look at duration broken down per query (so that each query gets the same color on all versions):
The regressions on 8.1 and 8.4 - 9.1 are clearly visible - I haven't looked much into them though.
If you want look into the results, a complete set of results (including logs, EXPLAIN, EXPLAIN ANALYZE and such) for the 16GB dataset is available here. Feel free to point any inconsistencies or errors.
If you look into release notes, two major features introduced in that version should catch your eye:
PostgreSQL 9.2 was another release significantly improving performance (almost 2x compared to 9.1), most likely because of this feature (see release notes):
You can also notice that further improvement happened in 9.4, by about 10%. That's likely thanks to optimization of
There are of course many other performance-related improvements both in all the releases since 8.0, but those are related to other kinds of queries.
So, what's the conclusion?