There’s a method that seems to be present in every single ruby ORM. I assume it’s reasonably common outside for ruby ORMs too, but I hadn’t noticed it before I started using Ruby.
The method is called find_or_create. Its looks something like:
class MyModel def self.find_or_create(find_params, create_params) self.find(find_params) || self.create(find_params.merge(create_params)) end end |
(although all code in this post will be ruby, the examples are using a pseudo-ORM that most closely resembles Sequel)
In fact, in many cases, this is exactly what the implementation looks like too.
The thing about this implementation is that it is completely wrong in the presence of concurrency: If you call find_or_create at the same time in two different processes with the same arguments, it will almost certainly result in duplicated creates. If there is a uniqueness constraint then this will cause one of them to error. If there is not, you’ll get two rows when you expected one.
It turns out that as far as I can tell it’s actually impossible to implement this method signature in a sensible way that satisfies the following properties:
- The method may always be called
- The method does not care about the structure of the table
- The method will never result in multiple inserts when called twice with the same arguments (in the presence of no other modifications to the database contents)
- Calling the method twice in two different processes will not error unless calling it once would error (in the presence of no other modifications to the database contents)
Here are two proposed implementations.
Just use transactions
This is the most straightforward and general implementation. You just wrap the method in a transaction. The problem is, that we only perform two queries and these absolutely cannot be interleaved, so the transaction in question must be serializable. So the code has to look something like this:
class MyModel def self.find_or_create(find_params, create_params) database.transaction(:level => :serializable) do self.find(find_params) || self.create(find_params.merge(create_params)) end end end |
But this suffers from a problem: What if we’re already inside a transaction? If the transaction is already serializable then that’s fine. If it’s not though there’s nothing we can do to fix this problem.
Additionally, support for correct handling of serializable transactions doesn’t seem to be great. Sequel, which is normally awesome, doesn’t really do the right thing with them (or rather it leaves doing the right thing up to the user and just handles setting the isolation level, which is fine really). But really we need a method that looks something like this:
def serializable_transaction if current_transaction if current_transaction.serializable? return yield else raise ArgumentError, "Cannot start serializable transaction inside a non-serializable transaction" end else loop do begin transaction(:serializable => true) do return yield end rescue FailedToSerialize end end end end |
i.e. if we’re already in a serializable transaction we just reuse it. If we’re not inside a transaction we retry the block until we don’t get a FailedToSerialize error (which may happen if you do concurrent modification inside serializable transactions). Note this is pseudo-Sequel. Most of these methods don’t exist in Sequel.
So, given correct support for serializable transactions, we can implement find_or_create as
class MyModel def self.find_or_create(find_params, create_params) database.serializable_transaction do self.find(find_params) || self.create(find_params.merge(create_params)) end end end |
What’s the problem?
Well, firstly, the inability to use this inside non-serializable transactions is rather a big deal. We like transactions, but we don’t want to be making our transactions serializable if we can possibly avoid it.
Secondly, serializable transactions are not always supported and may be really bad ideas on some databases – e..g. they might just cause a global lock. On PostgreSQL this method should behave mostly acceptably, but it’s still not ideal due to the lack of ability to nest it.
So here’s a version which imposes a different limitation:
Rely on uniqueness constraints
class MyModel def self.find_or_create(find_params, create_params) begin return self.find(find_params) || self.create(find_params.merge(create_params)) rescue UniqueConstraintViolation => e find_again = self.find(find_params) if find_again return find_again else raise e end end end end |
How does this work?
Well, we rely on there being a constraint enforced on the find params. If there is a uniqueness constraint that they satisfy then trying to do the insert if the find would have succeeded will reliably error.
If however there is some other unique constraint that might be violated – e.g. if some strict subset of the find params or some of the create params have a unique constraint on them – we may through a unique constraint violation for entirely unrelated reasons. This is why we test if the row is in the database and rethrow the unique constraint violation if it’s missing.
Note that one failure mode for this is if someone in another process inserts then deletes this row. This will result in us rethrowing a UniqueConstraintViolation where it would have succeeded if we immediately did an insert. I’m not bothered by that.
I think basically all my use cases for find_or_create key off a unique value in the find params, so this is the one I’ve been using mostly. It’s strictly less general than find_or_create normally is, but has the pleasant advantage of actually being correct.