When I gave my Performance Archaeology talk at pgconf.eu 2014 in Madrid, I mentioned I plan to do a few blog posts about that topic soon. It's been four months since the conference, so it's about time to fulfill the promise. In my defense, a lot changed since that talk (e.g. I moved to 2ndQuadrant.
If you look at the pgconf.eu talk, you'll see it has about four sections:
I've decided to keep this structure for the blog posts too, so this post is a short explanation into the benchmarking, why I did that and what to be careful about when interpreting the data. And then there will be three blog posts, each discussing results for one of the benchmarks.
The series of blog posts may be seen as a different way to present the talk, but in case you'd like to see the actual talk (say, on a local meetup), let me know.
PostgreSQL 7.4, the first release I've been using in production, was released more than 10 years ago (in November 2003 to be precise). When it comes to features, it's simple (although tedious) to skim through the Release Notes and check what new features were added - but what about performance? How did the performance evolve (hopefully improved) since 7.4?
The question you're maybe asking at this point is "Why do I care?" I personally see three main reasons why that might be interesting for you.
If you're already using an older PostgreSQL release (particularly one of those already unsupported ones, so anything older than 9.0), this might give you another reason to upgrade - not only you'll get a properly maintained version and new features, you'll also get better much performance in many cases.
If you're considering using PostgreSQL, this might give you a sense of how careful and reliable the development community is. Sure, we do mistakes from time to time, but the performance improvements in the long run are astonishing (sorry for the spoiler alert).
And finally, if you're a member of the PostgreSQL community, participating in the development in some way, hopefully you'll feel some accomplishment after looking at the results.
The second question you might ask is "Why to do that amount of testing?" After all, we often do a fair amount of performance testing when developing new features, people (hopefully) do performance testing when deploying their applications on new PostgreSQL versions, and we do have experience from running those systems in production, right?
The thing is it's really difficult to get an idea of how the performance changed over longer periods of time from those pieces. When we do performance testing during development, the scope is usually very narrow - targetted specifically to the feature being developped (and places that might be impacted by the code changes), ignoring all the remaining parts.
Sometimes we do get results from a more complex benchmark, comparing two PostgreSQL versions (usually the new one and the previous one), but even if you have a sequence of those benchmarks (e.g. comparing 9.1/9.2, 9.2/9.3 and 9.3/9.4), it's difficult to combine those results because only rarely are the systems comparable - people simply don't keep the old hardware indefinitely, update the kernel, buy new drives, etc.
And using data collected from production applications is even trickier, because not only are the systems subject to the same hardware/system upgrades, but the applications and workloads change too - the application gets improved/extended over time, number of users and data amounts grow over time, people use the systems differently etc.
So if you want to get an idea of performance changes over longer periods of time, then doing proper benchmarking is probably the only way.
To do the testing, I used three benchmarks, each simulating a slightly different workload.
The "default" benchmark used with PostgreSQL, actually part of the PostgreSQL source tree (often packaged into a separate contrib package). It's a very close implementation of TPC-B benchmark, simulating a transactional workload - doing a lot of "small" transactions, mostly accessing the data through primary keys.
The benchmark models a (very simple) bank, with branches, tellers and accounts, and performs simple operations (lookup account status, transfer funds from one account to another), etc. This is usually a reasonable representation of systems where users manipulate individual records - read individual articles in a CMS system, update orders in e-commerce site, etc.
Another benchmark specified by TPC, but focused on analytical workloads, so the queries are processing large amounts of data, performing aggregations (GROUP BY), various joins, etc. This is usually a good model for queries used for reporting, data analysis, decision support systems and so on.
It's based on archie, a simple web application to index and access mailing list archive, and the benchmarks were performed with data from main PostgreSQL mailing lists (roughly 1M messages).
To do the benchmarking, I've used two machines. An HP DL380 G5 server, and workstation with Intel i5-2500k CPU.
HP DL380 G5 (2007-2009)
Workstation i5 (2011-2013)
So the HP DL380 is a "proper server", but it's not the newest hardware. New servers certainly have more RAM, better CPUs memory with higher throughput, etc. But maybe this is a bit more fair, as it's roughly "half-way" between 7.4 and 9.4 - the development often reflects hardware changes, so using a very old hardware would be an unfair advantage for old PostgreSQL versions, and vice versa.
It has a traditional storage system (RAID10 on SAS drives with 10k rpm), and also an Intel S3700 SSD, connected through a cheap PCIe/SATA card (which is not ideal, that's for sure).
The i5 workstation is a bit more modern hardware, although not considered a server-grade. It has a more modern CPU, memory and so on (which may have really significant impact, as we'll see later) and only has an SSD, no spinning drives.
So, that's probably enough for an introduction. I'll post the pgbench results in about a week.