David R. MacIver's Blog: On the perils of PostgreSQL rules

On the perils of PostgreSQL rules

15 March 2010

So, I had some PostgreSQL code using Rules that I’d written for a real system that was behaving in highly counter-intuitive ways. It was basically using rules to increment a counter when an insert would violate a uniqueness constraint.

I went into #postgresql to ask them for some guidance. After getting lot of responses of the form “The problem is that you’re using rules”, they eventually explained it in a manner that convinced me that it was the result of a genuine logic error on my part, not this silly blanket forbidding of rules they were suggesting: Essentially the rule would do the wrong thing when I inserted multiple uniqueness violations as part of the same insert, but in the context where this was happening I shouldn’t have been doing that anyway.

A few hours later I noticed that the data in the table was still dodgy: A lot of values were twice what they should have been. Eventually I boiled it down to the following example:

drop table if exists foo;
create table foo(id int, count int);

create or replace rule inserts_should_add_on_conflict as
on insert to foo 
where exists(select 1 from foo where id = NEW.id)
do instead 
update foo set count = count + NEW.count
where id = NEW.id;

insert into foo values(1, 1);

Now, what should be in the table at this point? The rule shouldn’t fire, right? It would only fire if there were something in the table which had the id being inserted. Right?

Well, not so much:

david=# select * from foo;
 id | count 
----+-------
  1 |     2
(1 row)

After thinking about it for a bit and some help from #postgresql, this began to make sense, but it made sense in a way that convinced me they’d been right all along: Rules are an inherently confusing feature, and probably not to be trusted. See, here’s what the instead of rule does: It rewrites the insert into two queries, one insert with the guard negated and one update with the guard.

So, we first do the insert, which sees that the guard is not satisfied so that it’s ok to insert these values. You then end up with the row id=1, count=1 in the table. We then do the update, which sees that the guard *is* satisfied, because the insert has just made it so, and updates the row with id=1 to have count = count + 1.

So side effects which affect the constraints on the rules will always have deeply surprising effects like this, and should definitely be avoided at all costs. I’m not totally sold on whether this should be said of rules in general, but it’s starting to look that way to me.