Filtering deleted documents with PostgreSQL rules

I’m currently working on a Mysterious Project (coming soon to an internet near you) which involves a lot of user generated content (Yes, fine, slap a 2.0 on my name and call me “still in beta”). As such, it’s got all the usual problems with user generated content. In particular it has spam.

So, we need some sort of spam filtering in place to make sure we never show spam to users. But we don’t want to delete spam from the database – partly in case of mistakes, partly because we want to use the data for automated classification of spam.

Ok, this is easy enough to do. You add a flag “spam” to the table and don’t show the user anything flagged as spam.

The problem here is that this content gets used in all sorts of contexts, and it’s really annoying to have to add “where != spam” here.

No problem. We create a view. That’s what they’re for.

But this is slightly annoying: Basically all our access to content goes through this view, but modifications to content have to go through the original table. It would be really nice if we could have all updates and inserts going to the same thing we access the data from. “really nice” is partly aesthetic, but there’s also a boring practical reason: We’re using an ORM (ActiveRecord in fact. Sigh), and we’d like the ORM to access the filtered version, but we’d also like to be able to update the same objects.

Hang on. We’re using PostgreSQL. There’s an app… err. feature for that.

PostgreSQL has a feature called rules which allow you to change the meaning of various operations on a table (views in PostgreSQL are also tables). We can use these to make our view updateable. Let’s see how.

We’ll start with a slightly abstracted version of the problem. Instead of thinking about spam filtering we’ll concern ourselves with deleting posts. We want to retain the old posts but not show them:

david=# create sequence post_ids;
CREATE SEQUENCE
david=# create table unfiltered_posts(id int primary key default nextval('post_ids'), 
david(#                                        body text, 
david(#                                        deleted boolean not null default false);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "unfiltered_posts_pkey" for table "unfiltered_posts"
CREATE TABLE

So we first create our view that will be the posts which have not been deleted:

david=# create view posts as select * from unfiltered_posts where not deleted;
CREATE VIEW

Making sure everything’s working as expected:

david=# select * from posts;
 id | body | deleted 
----+------+---------
(0 rows)

david=# insert into unfiltered_posts(body, deleted) values('I like kittens', false);
INSERT 0 1

david=# insert into unfiltered_posts(body, deleted) values('I don''t like kittens', true);
INSERT 0 1

david=# select * from posts;
 id |      body      | deleted 
----+----------------+---------
  1 | I like kittens | f
(1 row)

david=# select * from unfiltered_posts ;
 id |         body         | deleted 
----+----------------------+---------
  1 | I like kittens       | f
  2 | I don't like kittens | t
(2 rows)

So all working as expected: unfiltered_posts marked deleted don’t show up in the view.

But of course this was the bit we already knew how to do. What doesn’t work is inserting into the view:

david=# insert into posts(body) values('I am the very model of a modern major general');
ERROR:  cannot insert into a view
HINT:  You need an unconditional ON INSERT DO INSTEAD rule.

Indeed it doesn’t work. But it does give us a nice hint of what to do next.

david=# create or replace rule insert_into_posts as on insert to posts do instead insert into unfiltered_posts(body) values(NEW.body);
CREATE RULE

So, now we can insert into the view:

david=# insert into posts(body) values('I am the very model of a modern major general');
INSERT 0 1
david=# select * from posts;
 id |                     body                      | deleted 
----+-----------------------------------------------+---------
  1 | I like kittens                                | f
  4 | I am the very model of a modern major general | f

This works, but I find it a bit ugly. The problem here is that you have to explicitly enumerate the fields in order for this to work. I couldn’t find a terribly satisfactory solution unfortunately. So if someone is reading this who knows more about postgresql than I do I’d love go get some hints.

The following does work as an alternative:

david=# create or replace rule insert_into_posts as on insert to posts do instead insert into unfiltered_posts values(NEW.*);
CREATE RULE

But the problem is that it plays badly with the defaults. If we try this we get:

david=# insert into posts(body) values('I am the very model of a modern major general');
ERROR:  null value in column "id" violates not-null constraint

The problem is that inserting null into a not-null column doesn’t replace null with the default value. It would be nice if it did as that would make this easy, but oh well (this isn’t postgresql specific behaviour. I’m not aware of any database where inserting null into a not null default blah column will work. Certainly MySQL does the same thing). You could probably make this work with a before insert or update trigger, but that’s a little gross.

An alternative version which offers slightly better functionality but still requires you to explicitly enumerate the columns in the rule is the following:

david=# create or replace rule insert_into_posts as on insert to posts do instead insert into unfiltered_posts values(coalesce(NEW.id, nextval('post_ids')), NEW.body, coalesce(NEW.deleted, false));
CREATE RULE
david=# insert into posts(body) values('I''ve information vegetable, animal, and mineral');
INSERT 0 1
david=# select * from posts;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  3 | I am the very model of a modern major general   | f
  4 | I've information vegetable, animal, and mineral | f

This requires us to duplicate the defaults as well as the columns, which is rather annoying, but at least it works satisfactorily (note: Some of you will complain that I didn’t explicitly enumerate the columns in the insert into. This is deliberate – the view will break if I change the table structure in any interesting way. If I explicitly enumerated the column names it would instead silently do the wrong thing).

So, this works. We can do the same on update:

 


david=#   create or replace rule update_to_posts 
david-#   as on update to posts 
david-#   do instead 
david-#      update unfiltered_posts 
david-#      set id = coalesce(NEW.id, OLD.id), 
david-#           body = coalesce(NEW.body, OLD.body), 
david-#           deleted = coalesce(NEW.deleted, OLD.deleted) 
david-#      where id = OLD.id;
CREATE RULE

david=# update posts set deleted = true where id = 4;
UPDATE 1
david=# select * from posts;
 id |                     body                      | deleted 
----+-----------------------------------------------+---------
  1 | I like kittens                                | f
  3 | I am the very model of a modern major general | f
(2 rows)

david=# select * from unfiltered_posts;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  2 | I don't like kittens                            | t
  3 | I am the very model of a modern major general   | f
  4 | I've information vegetable, animal, and mineral | t
(4 rows)

So now updating things in posts works. Note that if we try to update a filtered post it will not work:

david=# update posts set body = 'kittens' where id = 4;
UPDATE 0
david=# select * from unfiltered_posts ;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  2 | I don't like kittens                            | t
  3 | I am the very model of a modern major general   | f
  4 | I've information vegetable, animal, and mineral | t
(4 rows)

And, finally, we want to hook deletion into it. Obviously we don’t want deletion to delete things from the underlying table but instead to set their deleted flag to be false:

david=# create or replace rule delete_posts 
david-# as on delete to posts do instead 
david-# update unfiltered_posts 
david-# set deleted = true where id = OLD.id;
CREATE RULE
david=# select * from posts;
 id |                     body                      | deleted 
----+-----------------------------------------------+---------
  1 | I like kittens                                | f
  3 | I am the very model of a modern major general | f
(2 rows)

david=# delete from posts where id = 3;
DELETE 0
david=# select * from posts;
 id |      body      | deleted 
----+----------------+---------
  1 | I like kittens | f
(1 row)

david=# select * from unfiltered_posts;
 id |                      body                       | deleted 
----+-------------------------------------------------+---------
  1 | I like kittens                                  | f
  2 | I don't like kittens                            | t
  4 | I've information vegetable, animal, and mineral | t
  3 | I am the very model of a modern major general   | t
(4 rows)

So there we have it: A view which we can insert into, update and delete. Despite the slight annoyances around default values, this is definitely a really neat feature. I look forward to exploring its use.

If you want to have a play with this, I’ve created a gist containing the table, view and rules.

This entry was posted in Code, SQL on by .

2 thoughts on “Filtering deleted documents with PostgreSQL rules

  1. Pingback: Filtering deleted documents with PostgreSQL rules | DbRunas

  2. JD Maturen

    Nice! Reminded me of a conversation I had once about implementing an insert-only database. That is, the underlying `unfiltered_posts` table is self-referencing and the `posts` view simply selects the most recent version of the `unfiltered_post` [with a null body signifying "deleted"]. So if you were to look at the `unfiltered_posts` directly you’d see the complete history of each post — probably better called `post_history` or some such instead of `unfiltered_posts`. In any matter, turns out its pretty easy to do with Postgres rules: https://gist.github.com/46aaee2d29d0e87a28ea

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>