Why PGXN distribution install tests fail


2014/08/06 by Tomas Vondra

In the post introducing PGXN Tester, I promised to present some stats and basic analysis of why tests of PGXN distributions fail.

Let's clarify some basic terms first. First, what is a distribution? Most of the time, it's a PostgreSQL extension in a fancy package, especially with a META.json specification containing additional information that are not available for plain extensions - description, links to git repositories, etc. Also, it may contain prerequisities - e.g. which PostgreSQL versions it's compatible with, etc. (more on this later).

It's a bit more complicated though - the distribution may pack multiple extensions (e.g. the "pgTAP" distribution used in META.json example packs "pgtap" and "schematap" extensions). Also, there are distributions that pack other kinds of software, not PostgreSQL extensions - for example omnipitr provides command-line utilities for PITR, and so on. Most distributions however pack a single PostgreSQL extension, and these two terms are frequently used as synonyms (after all, the X in PGXN stands for "eXtension").

The distributions are versioned (just like rpm, deb or any other packages), and each version has a "release status" with three possible values - unstable (alpha version), testing (beta version) and stable (ready for production). This is important when doing stats, because the unstable/testing versions are somehow expected to have bugs, and what really matters are stable versions (because that's what people are supposed to install on production). More specifically, what matters is the last stable version of the distribution.

So let's see some stats why the pgxnclient install (i.e. essentially make install) fails ...


Overview

Currently, there are 126 distributions and 470 versions. By considering only the last versions for each release status (because that's what gets installed by pgxnclient install by default), and doing the tests on a range of PostgreSQL versions (8.2 - 9.4), this corresponds to 2727 tests and 1177 of those tests fail at the install stage. Per release status, it looks like this:

release status total tests failed tests % failed
stable 1812 782 43%
testing 618 212 34%
unstable 297 183 61%

Apparently when a user does pgxnclient install, in ~43% cases he gets an error. BTW this is illustrated by the "install" column at pgxn-tester.org, broken by PostgreSQL major version.

But why does that fail so often?

Tooling issues and limitations

The first issue is that the testing tooling and environment is not perfect, and the PGXN itself has some limitations. Consider for example that it's impossible to decide the target OS for the extension (Is it for Windows or Linux? Or maybe some particular Linux distribution, as for example debversion?). Similarly for other dependencies - e.g. libraries required by the distribution.

None of these is listed in the META.json - for example mongo_fdw certainly requires MongoDB libraries and headers, but the META.json does not mention it at all. OTOH oracle_fdw mentions dependence on Oracle, but how do you check this for arbitrary dependencies?

I tried hard to install all important libraries, there are still missing a few (e.g. mongo, firebird and oracle, and probably a few more). Of course, a missing library usually means failure at compile time.

I don't have a good, simple and quick solution for this problem :-( The only possibility is to incrementally improve the tooling over time.

Top 5 issues

Now, let's talk about top 5 issues in the distributions - in the 'stable' versions (i.e. those 782 failures mentioned before). For each failure, I'll quickly explain the cause(s) and point to one or two test results as an example. The number in parenthesis is the percentage of failures caused by this issue, and the top 5 issues corresponds to ~90% of the failures.

#1 - mising Makefile.global (31%)

A typical example of this issue looks like this:

INFO: best version: postgres_fdw 1.0.0
INFO: saving /var/pgxn-tester/hactar/tmp/tmpARX1nz/postgres_fdw-1.0.0.zip
INFO: unpacking: /var/hactar/tmp/tmpARX1nz/postgres_fdw-1.0.0.zip
INFO: building extension
Makefile:25: ../../src/Makefile.global: No such file or directory
Makefile:26: /contrib/contrib-global.mk: No such file or directory
gmake: *** No rule to make target `/contrib/contrib-global.mk'.  Stop.
ERROR: command returned 2: gmake PG_CONFIG=/var/pg/9.2.8/bin/pg_config all

This comes from the use of USE_PGXS make variable, which was explained by David Wheeler in great detail, so go and read his post. In short, it's typical for extensions that used to live within PostgreSQL source tree, but sometimes it was handy to build them against an existing installation (by using make USE_PGXS=1). So the makefile contains something like this:

ifdef USE_PGXS
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/postgres_fdw
top_builddir = ../..
include $(top_builddir)/src/Makefile.global
include $(top_srcdir)/contrib/contrib-global.mk
endif

For extensions distributed separately this doesn't really make much sense, so pgxnclient does not use USE_PGXS variable, and make chooses the second branch, referencing the non-existent global Makefile. For complete examples of such failures are postgres_fdw or pg-json.

And of course, many extensions start by copying an existing extension, so this anti-pattern spreads. So much it's #1 cause for failures for extensions on PGXN.

Solution is very simple, actually. Just get rid of the branching, and keep only the ifdef USE_PGXS branch. If you really need both (e.g. when you need to keep ability to build in-tree), fix this before building the distribution for PGXN (e.g. by a simple build script, or whatever).

#2 - no such file or directory (22%)

This failure type may have many causes. The first case (e.g. in tds_fdw)

src/tds_fdw.c:52:22: error: sybfront.h: No such file or directory
src/tds_fdw.c:53:19: error: sybdb.h: No such file or directory

is a typical example of a missing library. I simply don't have Sybase (or Microsoft SQL) header files installed, so the compilation of the extension that needs them naturally fails. There's nothing the extension can do to fix this - this is a problem with the environment on the machine running the tests (and I'm working on fixing this). FDW extensions for other databases (Oracle, Firebird, ...) probably suffer by the same problem.

The second case - as illustrated by twitter_fdw - seems very similar:

twitter_fdw.c:4:38: error: catalog/pg_foreign_table.h: No such file or directory
twitter_fdw.c:5:39: error: catalog/pg_foreign_server.h: No such file or directory
twitter_fdw.c:8:28: error: foreign/fdwapi.h: No such file or directory
twitter_fdw.c:9:29: error: foreign/foreign.h: No such file or directory

There's one important difference, though - these missing header files are from PostgreSQL itself, and are part of the FDW API. Notice however that the test was performed on PostgreSQL 8.3.23 - long before the FDW API was created. The actual problem lies in the META.json of the distribution - it does not specify which PostgreSQL versions it's compatible with.

This can be fixed simply by adding proper "prereqs" into the META.json.

A third case may be illustrated by orafce:

INFO: best version: orafce 3.0.4
INFO: saving /var/pgxn-tester/hactar/tmp/tmpvoqYNi/orafce-3.0.4.zip
INFO: unpacking: /var/pgxn-tester/hactar/tmp/tmpvoqYNi/orafce-3.0.4.zip
INFO: building extension
cat orafunc-common.sql orafunc-9.2.sql > orafunc.sql.in
cat: orafunc-9.2.sql: No such file or directory
gmake: *** [orafunc.sql.in] Error 1
gmake: *** Deleting file `orafunc.sql.in'
ERROR: command returned 2: gmake PG_CONFIG=/var/pg/9.2.8/bin/pg_config all

After a brief investigation of the Makefile, the problem seems to be here:

ifndef MAJORVERSION
MAJORVERSION := $(basename $(VERSION))
endif

orafunc.sql.in:
        cat orafunc-common.sql orafunc-$(MAJORVERSION).sql > orafunc.sql.in

which essentially says "use the orafunc-X.sql matching the PostgreSQL major version," while the last version supported is 9.0 (i.e. there's no SQL file for PostgreSQL 9.2).

To fix this, either limit the supported PostgreSQL versions, or add the missing files.

#3 - mising Makefile in extension root (17%)

The failure is quite simple:

INFO: best version: pgbson 1.0.1
INFO: saving /var/pgxn-tester/hudzen-10/tmp/tmpvwE2M3/pgbson-1.0.1.zip
INFO: unpacking: /var/pgxn-tester/hudzen-10/tmp/tmpvwE2M3/pgbson-1.0.1.zip
INFO: building extension
ERROR: no Makefile found in the extension root

and similarly to the previous section, it may have one of several causes.

The first and simplest one is that there's no Makefile at all, because it's not really necessary, as e.g. for omnipitr. which is a collection of Perl scripts simplifying PITR-related tasks. It's not a PostgreSQL extension at all, it does not need to be built or installed into the database, so why have a Makefile? It would fail in the load/check phase anyway. I'm not really sure how to properly "fix" this to prevent these "false failures."

The second possibility is that the extension uses a different build system (e.g. pgbson uses cmake), or that the Makefile is located in a subdirectory (e.g. in src). I'm not sure about the cmake, but fixing the incorrectly located Makefile is trivial - just move it to the right place.

I've also seen other failures (as for example pg_top that seems to be caused by some broken packaging (no META.json, pg_top-0.1.0 is a regular file, containing META.json, and the source code is located in pg_top-1.7.0).

#4 - will not overwrite just-created file (11%)

This seems to be a quite simple problem. In short, it fails like this:

INFO: installing extension
/bin/mkdir -p '/var/pgxn-tester/hudzen-10/pg/9.3.4/share/postgresql/extension'
/bin/mkdir -p '/var/pgxn-tester/hudzen-10/pg/9.3.4/share/postgresql/extension'
/bin/mkdir -p '/var/pgxn-tester/hudzen-10/pg/9.3.4/share/doc/postgresql/extension'
/usr/bin/install -c -m 644 ./pair.control '/var/pg/9.3.4/share/postgresql/extension/'
/usr/bin/install -c -m 644 ./sql/pair--0.1.2.sql ./sql/pair--unpackaged--0.1.2.sql \
    ./sql/pair--0.1.2.sql  '/var/pg/9.3.4/share/postgresql/extension/'
/usr/bin/install: will not overwrite just-created \
    `/var/pg/9.3.4/share/postgresql/extension/pair--0.1.2.sql' with `./sql/pair--0.1.2.sql'
gmake: *** [install] Error 1

See pg_statsd, pair or json_accessors for a complete log. The cause seems to be fairly trivial - caused by this Makefile line:

DATA = $(wildcard sql/*--*.sql) sql/$(EXTENSION)--$(EXTVERSION).sql

which evaluates into the same filename twice (and /usr/bin/install refuses to overwrite the first instance of the file). Fix is as simple as removing this particular line from the Makefile.

#5 - missing makefile target (10%)

I'm not really sure about this one, but it seems to be a mix of various causes. For example madlib fails like this:

CMake Error at CMakeLists.txt:18 (cmake_minimum_required):
  CMake 2.8.4 or higher is required.  You are running version 2.6.4

-- Configuring incomplete, errors occurred!
INFO: building extension
gmake -C build all
gmake[1]: Entering directory `/var/pgxn-tester/hudzen-10/tmp/tmpwvHTg1/madlib-1.3.0/build'
gmake[1]: *** No rule to make target `all'.  Stop.
gmake[1]: Leaving directory `/var/pgxn-tester/hudzen-10/tmp/tmpwvHTg1/madlib-1.3.0/build'
gmake: *** [all] Error 2
ERROR: command returned 2: gmake PG_CONFIG=/var/pg/9.0.17/bin/pg_config all

so this is simply another case of broken dependency (this time on cmake version). This needs to be fixed at the testing machine.

But for example hashtypes fail like this:

gmake: *** No rule to make target `hashtypes.so', needed by `all'.  Stop.
ERROR: command returned 2: gmake PG_CONFIG=/var/pg/8.2.23/bin/pg_config all

and what's even more important is that it builds fine on other PostgreSQL versions (8.4 or up), so I'd guess a missing dependency on PostgreSQL version. Easy to fix - just add a prerequisity into the META.json.





comments powered by Disqus