diziet: (Default)
[personal profile] diziet

Any software system has underlying design principles, and any software project has process rules. But I seem to be seeing more often, a pathological pattern where abstract and shakily-grounded broad principles, and even contrived and sophistic objections, are used to block sensible changes.

Today I will go through an example in detail, before ending with a plea:

PostgreSQL query planner, WITH [MATERIALIZED] optimisation fence

Background history

PostgreSQL has a sophisticated query planner which usually gets the right answer. For good reasons, the pgsql project has resisted providing lots of knobs to control query planning. But there are a few ways to influence the query planner, for when the programmer knows more than the planner.

One of these is the use of a WITH common table expression. In pgsql versions prior to 12, the planner would first make a plan for the WITH clause; and then, it would make a plan for the second half, counting the WITH clause's likely output as a given. So WITH acts as an "optimisation fence".

This was documented in the manual - not entirely clearly, but a careful reading of the docs reveals this behaviour:

The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards.

Users (authors of applications which use PostgreSQL) have been using this technique for a long time.

New behaviour in PostgreSQL 12

In PostgreSQL 12 upstream were able to make the query planner more sophisticated. In particular, it is now often capable of looking "into" the WITH common table expression. Much of the time this will make things better and faster.

But if WITH was being used for its side-effect as an optimisation fence, this change will break things: queries that ran very quickly in earlier versions might now run very slowly. Helpfully, pgsql 12 still has a way to specify an optimisation fence: specifying WITH ... AS MATERIALIZED in the query.

So far so good.

Upgrade path for existing users of WITH fence

But what about the upgrade path for existing users of the WITH fence behaviour? Such users will have to update their queries to add AS MATERIALIZED. This is a small change. Having to update a query like this is part of routine software maintenance and not in itself very objectionable. However, this change cannnot be made in advance because pgsql versions prior to 12 will reject the new syntax.

So the users are in a bit of a bind. The old query syntax can be unuseably slow with the new database and the new syntax is rejected by the old database. Upgrading both the database and the application, in lockstep, is a flag day upgrade, which every good sysadmin will want to avoid.

A solution to this problem

Colin Watson proposed a very simple solution: make the earlier PostgreSQL versions accept the new MATERIALIZED syntax. This is correct since the new syntax specifies precisely the actual behaviour of the old databases. It has no deleterious effect on any users of older pgsql versions. It makes it possible to add the new syntax to the application, before doing the database upgrade, decoupling the two upgrades.

Colin Watson even provided an implementation of this proposal.

The solution is rejected by upstream

Unfortunately upstream did not accept this idea. You can read the whole thread yourself if you like. But in summary, the objections were (italic indicates literal quotes):

  • New features don't gain a backpatch. This is a project policy. Of course this is not a new feature, and if it is an exception should be made. This was explained clearly in the thread.
  • I'm not sure the "we don't want to upgrade application code at the same time as the database" is really tenable. This is quite an astonishing statement, particularly given the multiple users who said they wanted to do precisely that.
  • I think we could find cases where we caused worse breaks between major versions. Paraphrasing: "We've done worse things in the past so we should do this bad thing too". WTF?
  • One disadvantage is that this will increase confusion for users, who'll get used to the behavior on 12, and then they'll get confused on older releases. This seems highly contrived. Surely the number of people who are likely to suffer this confusion is tiny. Providing the new syntax in old versions (including of course the appropriate changes to the docs everywhere) might well make such confusion less rather than more likely.
  • [Poster is concerned about] 11.6 and up allowing a syntax that 11.0-11.5 don't. People are likely to write code relying on this and then be surprised when it doesn't work on a slightly older server. And, similarly: we'll then have a lot more behavior differences between minor releases. Again this seems a contrived and unconvincing objection. As that first poster even notes: Still, is that so much different from cases where we fix a bug that prevented some statement from working? No, it isn't.
  • if we started looking we'd find many changes every year that we could justify partially or completely back-porting on similar grounds ... we'll certainly screw it up sometimes. This is a slippery slope argument. But there is no slippery slope: in particular, the proposed change does not change any of the substantive database logic, and the upstream developers will hardly have any difficulty rejecting future more risky backport proposals.
  • if you insist on using the same code with pre-12 and post-12 releases, this should be achievable (at least in most cases) by using the "offset 0" trick. What? First I had heard of it but this in fact turns out to be true! Read more about this, below...

I find these extremely unconvincing, even taken together. Many of them are very unattractive things to hear one's upstream saying.

At best they are knee-jerk and inflexible application of very general principles. The authors of these objections seem to have lost sight of the fact that these principles have a purpose. When these kind of software principles work against their purposes, they should be revised, or exceptions made.

At worst, it looks like a collective effort to find reasons - any reasons, no matter how bad - not to make this change.

The OFFSET 0 trick

One of the responses in the thread mentions OFFSET 0. As part of writing the queries in the Xen Project CI system, and preparing for our system upgrade, I had carefully read the relevant pgsql documentation. This OFFSET 0 trick was new to me.

But, now that I know the answer, it is easy to provide the right search terms and find, for example, this answer on stackmumble. Apparently adding a no-op OFFSET 0 to the subquery defeats the pgsql 12 query planner's ability to see into the subquery.

I think OFFSET 0 is the better approach since it's more obviously a hack showing that something weird is going on, and it's unlikely we'll ever change the optimiser behaviour around OFFSET 0 ... wheras hopefully CTEs will become inlineable at some point CTEs became inlineable by default in PostgreSQL 12.
So in fact there is a syntax for an optimisation fence that is accepted by both earlier and later PostgreSQL versions. It's even recommended by pgsql devs. It's just not documented, and is described by pgsql developers as a "hack". Astonishingly, the fact that it is a "hack" is given as a reason to use it!

Well, I have therefore deployed this "hack". No doubt it will stay in our codebase indefinitely.

Please don't be like that!

I could come up with a lot more examples of other projects that have exhibited similar arrogance. It is becoming a plague! But every example is contentious, and I don't really feel I need to annoy a dozen separate Free Software communities. So I won't make a laundry list of obstructiveness.

If you are an upstream software developer, or a distributor of software to users (eg, a distro maintainer), you have a lot of practical power. In theory it is Free Software so your users could just change it themselves. But for a user or downstream, carrying a patch is often an unsustainable amount of work and risk. Most of us have patches we would love to be running, but which we haven't even written because simply running a nonstandard build is too difficult, no matter how technically excellent our delta.

As an upstream, it is very easy to get into a mindset of defending your code's existing behaviour, and to turn your project's guidelines into inflexible rules. Constant exposure to users who make silly mistakes, and rudely ask for absurd changes, can lead to core project members feeling embattled.

But there is no need for an upstream to feel embattled! You have the vast majority of the power over the software, and over your project communication fora. Use that power consciously, for good.

I can't say that arrogance will hurt you in the short term. Users of software with obstructive upstreams do not have many good immediate options. But we do have longer-term choices: we can choose which software to use, and we can choose whether to try to help improve the software we use.

After reading Colin's experience, I am less likely to try to help improve the experience of other PostgreSQL users by contributing upstream. It doesn't seem like there would be any point. Indeed, instead of helping the PostgreSQL community I am now using them as an example of bad practice. I'm only half sorry about that.

(no subject)

Date: 2020-08-21 02:58 pm (UTC)
andrewducker: (Default)
From: [personal profile] andrewducker
Yeah, that's a terrible way to approach things. You never want to force a bunch of changes at the same time rather than making a phased approach possible.

Profile

diziet: (Default)
Ian Jackson

May 2025

S M T W T F S
     123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags