Category Archives: SQL

On the perils of PostgreSQL rules

So, I had some PostgreSQL code using Rules that I’d written for a real system that was behaving in highly counter-intuitive ways. It was basically using rules to increment a counter when an insert would violate a uniqueness constraint.

I went into #postgresql to ask them for some guidance. After getting lot of responses of the form “The problem is that you’re using rules”, they eventually explained it in a manner that convinced me that it was the result of a genuine logic error on my part, not this silly blanket forbidding of rules they were suggesting: Essentially the rule would do the wrong thing when I inserted multiple uniqueness violations as part of the same insert, but in the context where this was happening I shouldn’t have been doing that anyway.

A few hours later I noticed that the data in the table was still dodgy: A lot of values were twice what they should have been. Eventually I boiled it down to the following example:

drop table if exists foo;
create table foo(id int, count int);

create or replace rule inserts_should_add_on_conflict as
on insert to foo 
where exists(select 1 from foo where id = NEW.id)
do instead 
update foo set count = count + NEW.count
where id = NEW.id;

insert into foo values(1, 1);

Now, what should be in the table at this point? The rule shouldn’t fire, right? It would only fire if there were something in the table which had the id being inserted. Right?

Well, not so much:

david=# select * from foo;
 id | count 
----+-------
  1 |     2
(1 row)

After thinking about it for a bit and some help from #postgresql, this began to make sense, but it made sense in a way that convinced me they’d been right all along: Rules are an inherently confusing feature, and probably not to be trusted. See, here’s what the instead of rule does: It rewrites the insert into two queries, one insert with the guard negated and one update with the guard.

So, we first do the insert, which sees that the guard is not satisfied so that it’s ok to insert these values. You then end up with the row id=1, count=1 in the table. We then do the update, which sees that the guard *is* satisfied, because the insert has just made it so, and updates the row with id=1 to have count = count + 1.

So side effects which affect the constraints on the rules will always have deeply surprising effects like this, and should definitely be avoided at all costs. I’m not totally sold on whether this should be said of rules in general, but it’s starting to look that way to me.

This entry was posted in SQL on by .

Filtering deleted documents with PostgreSQL rules

I’m currently working on a Mysterious Project (coming soon to an internet near you) which involves a lot of user generated content (Yes, fine, slap a 2.0 on my name and call me “still in beta”). As such, it’s got all the usual problems with user generated content. In particular it has spam.

So, we need some sort of spam filtering in place to make sure we never show spam to users. But we don’t want to delete spam from the database – partly in case of mistakes, partly because we want to use the data for automated classification of spam.

Ok, this is easy enough to do. You add a flag “spam” to the table and don’t show the user anything flagged as spam.

The problem here is that this content gets used in all sorts of contexts, and it’s really annoying to have to add “where != spam” here.

No problem. We create a view. That’s what they’re for.

But this is slightly annoying: Basically all our access to content goes through this view, but modifications to content have to go through the original table. It would be really nice if we could have all updates and inserts going to the same thing we access the data from. “really nice” is partly aesthetic, but there’s also a boring practical reason: We’re using an ORM (ActiveRecord in fact. Sigh), and we’d like the ORM to access the filtered version, but we’d also like to be able to update the same objects.

Hang on. We’re using PostgreSQL. There’s an app… err. feature for that.

PostgreSQL has a feature called rules which allow you to change the meaning of various operations on a table (views in PostgreSQL are also tables). We can use these to make our view updateable. Let’s see how.

We’ll start with a slightly abstracted version of the problem. Instead of thinking about spam filtering we’ll concern ourselves with deleting posts. We want to retain the old posts but not show them:

david=# create sequence post_ids;
CREATE SEQUENCE
david=# create table unfiltered_posts(id int primary key default nextval('post_ids'), 
david(#                                        body text, 
david(#                                        deleted boolean not null default false);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "unfiltered_posts_pkey" for table "unfiltered_posts"
CREATE TABLE

So we first create our view that will be the posts which have not been deleted:

david=# create view posts as select * from unfiltered_posts where not deleted;
CREATE VIEW

Making sure everything’s working as expected:

david=# select * from posts;
 id | body | deleted 
----+------+---------
(0 rows)

david=# insert into unfiltered_posts(body, deleted) values('I like kittens', false);
INSERT 0 1

david=# insert into unfiltered_posts(body, deleted) values('I don''t like kittens', true);
INSERT 0 1

david=# select * from posts;
 id |      body      | deleted 
----+----------------+---------
  1 | I like kittens | f
(1 row)

david=# select * from unfiltered_posts ;
 id |         body         | deleted 
----+----------------------+---------
  1 | I like kittens       | f
  2 | I don't like kittens | t
(2 rows)

So all working as expected: unfiltered_posts marked deleted don’t show up in the view.

But of course this was the bit we already knew how to do. What doesn’t work is inserting into the view:

david=# insert into posts(body) values('I am the very model of a modern major general');
ERROR:  cannot insert into a view
HINT:  You need an unconditional ON INSERT DO INSTEAD rule.

Indeed it doesn’t work. But it does give us a nice hint of what to do next.

david=# create or replace rule insert_into_posts as on insert to posts do instead insert into unfiltered_posts(body) values(NEW.body);
CREATE RULE

So, now we can insert into the view:

david=# insert into posts(body) values('I am the very model of a modern major general');
INSERT 0 1
david=# select * from posts;
 id |                     body                      | deleted 
----+-----------------------------------------------+---------
  1 | I like kittens                                | f
  4 | I am the very model of a modern major general | f

This works, but I find it a bit ugly. The problem here is that you have to explicitly enumerate the fields in order for this to work. I couldn’t find a terribly satisfactory solution unfortunately. So if someone is reading this who knows more about postgresql than I do I’d love go get some hints.

The following does work as an alternative:

david=# create or replace rule insert_into_posts as on insert to posts do instead insert into unfiltered_posts values(NEW.*);
CREATE RULE

But the problem is that it plays badly with the defaults. If we try this we get:

david=# insert into posts(body) values('I am the very model of a modern major general');
ERROR:  null value in column "id" violates not-null constraint

The problem is that inserting null into a not-null column doesn’t replace null with the default value. It would be nice if it did as that would make this easy, but oh well (this isn’t postgresql specific behaviour. I’m not aware of any database where inserting null into a not null default blah column will work. Certainly MySQL does the same thing). You could probably make this work with a before insert or update trigger, but that’s a little gross.

An alternative version which offers slightly better functionality but still requires you to explicitly enumerate the columns in the rule is the following:

david=# create or replace rule insert_into_posts as on insert to posts do instead insert into unfiltered_posts values(coalesce(NEW.id, nextval('post_ids')), NEW.body, coalesce(NEW.deleted, false));
CREATE RULE
david=# insert into posts(body) values('I''ve information vegetable, animal, and mineral');
INSERT 0 1
david=# select * from posts;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  3 | I am the very model of a modern major general   | f
  4 | I've information vegetable, animal, and mineral | f

This requires us to duplicate the defaults as well as the columns, which is rather annoying, but at least it works satisfactorily (note: Some of you will complain that I didn’t explicitly enumerate the columns in the insert into. This is deliberate – the view will break if I change the table structure in any interesting way. If I explicitly enumerated the column names it would instead silently do the wrong thing).

So, this works. We can do the same on update:

 


david=#   create or replace rule update_to_posts 
david-#   as on update to posts 
david-#   do instead 
david-#      update unfiltered_posts 
david-#      set id = coalesce(NEW.id, OLD.id), 
david-#           body = coalesce(NEW.body, OLD.body), 
david-#           deleted = coalesce(NEW.deleted, OLD.deleted) 
david-#      where id = OLD.id;
CREATE RULE

david=# update posts set deleted = true where id = 4;
UPDATE 1
david=# select * from posts;
 id |                     body                      | deleted 
----+-----------------------------------------------+---------
  1 | I like kittens                                | f
  3 | I am the very model of a modern major general | f
(2 rows)

david=# select * from unfiltered_posts;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  2 | I don't like kittens                            | t
  3 | I am the very model of a modern major general   | f
  4 | I've information vegetable, animal, and mineral | t
(4 rows)

So now updating things in posts works. Note that if we try to update a filtered post it will not work:

david=# update posts set body = 'kittens' where id = 4;
UPDATE 0
david=# select * from unfiltered_posts ;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  2 | I don't like kittens                            | t
  3 | I am the very model of a modern major general   | f
  4 | I've information vegetable, animal, and mineral | t
(4 rows)

And, finally, we want to hook deletion into it. Obviously we don’t want deletion to delete things from the underlying table but instead to set their deleted flag to be false:

david=# create or replace rule delete_posts 
david-# as on delete to posts do instead 
david-# update unfiltered_posts 
david-# set deleted = true where id = OLD.id;
CREATE RULE
david=# select * from posts;
 id |                     body                      | deleted 
----+-----------------------------------------------+---------
  1 | I like kittens                                | f
  3 | I am the very model of a modern major general | f
(2 rows)

david=# delete from posts where id = 3;
DELETE 0
david=# select * from posts;
 id |      body      | deleted 
----+----------------+---------
  1 | I like kittens | f
(1 row)

david=# select * from unfiltered_posts;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  2 | I don't like kittens                            | t
  4 | I've information vegetable, animal, and mineral | t
  3 | I am the very model of a modern major general   | t
(4 rows)

So there we have it: A view which we can insert into, update and delete. Despite the slight annoyances around default values, this is definitely a really neat feature. I look forward to exploring its use.

If you want to have a play with this, I’ve created a gist containing the table, view and rules.

This entry was posted in Code, SQL on by .

Yet another MySQL Fail

mysql> create table stuff (name varchar(32));
Query OK, 0 rows affected (0.24 sec)

mysql> insert into stuff values (’foo’), (’1′), (’0′);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select * from stuff;
+——+
| name |
+——+
| foo |
| 1 |
| 0 |
+——+
3 rows in set (0.00 sec)

mysql> delete from stuff where name = 0;
Query OK, 2 rows affected (0.09 sec)

mysql> select * from stuff;
+——+
| name |
+——+
| 1 |
+——+
1 row in set (0.00 sec)

mysql> create table stuff (name varchar(32));
Query OK, 0 rows affected (0.24 sec)

mysql> insert into stuff values (’foo’), (’1′), (’0′);
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select * from stuff;
+——+
| name |
+——+
| foo |
| 1 |
| 0 |
+——+
3 rows in set (0.00 sec)

mysql> delete from stuff where name = 0;
Query OK, 2 rows affected (0.09 sec)

mysql> select * from stuff;
+——+
| name |
+——+
| 1 |
+——+
1 row in set (0.00 sec)

mysql> WTF????
-> ;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘WTF????’ at line 1

So, what’s going on here? I said to delete everything where the name was 0, but it deleted the row ‘foo’.

The following might help:

mysql> create table more_stuff(id int);
Query OK, 0 rows affected (0.19 sec)

mysql> insert into more_stuff values(’foo’);
Query OK, 1 row affected, 1 warning (0.00 sec)

mysql> select * from more_stuff;
+——+
| id |
+——+
| 0 |
+——+
1 row in set (0.00 sec)

When you try to use a string as an integer in MySQL, it takes non numeric strings and turns them into zero. So when you test name = 0, it converts name into an integer and turns that into 0. Consequently strings which can’t be parsed as an integer result in true for this test.

At this point I would rant about how mindbogglingly stupid this behaviour is, but I don’t think I can really be bothered.

This entry was posted in programming, SQL and tagged on by .