David's Blog

TIL: ActiveRecord bulk lookup ID deduplication

If you're writing a Rails app and use ActiveRecord, this is a potential pitfall to be aware of:

Let's say you have some model, MyModel, that you want to do a bulk lookup of based on a set of IDs that come from elsewhere. You'd do something like:

  models = MyModel.where(id: some_array_of_ids)

This will work just fine. However, there is a bit of a lurking performance cliff with this approach. It turns out that, surprisingly, ActiveRecord doesn't de-duplicate the array of IDs before constructing the SQL query text. That is, if you happen to have an array with 5 IDs but they're all the same ID then you'll end up with an underlying query like:

  SELECT * FROM my_models WHERE id IN (7, 7, 7, 7, 7);

On an array of just 5 elements this isn't too much of a big deal and doesn't hurt correctness in any way. You'll still get back a either 0 or 1 results depending on if ID 7 exists in your table. However, once you're potentially looking up tens, hundreds, or thousands of IDs at once then you might hit some larger performance problems.

At least on the database I'm most familiar with running in production, MySQL, it seemingly doesn't optimize this query in the way you'd expect and it could take much longer than just looking up the single ID of 7 (or if your array is much larger and you have multiple IDs in it, all duplicated, this can get quite bad). Even if your database does optimize this query, the query you're sending over the wire is still larger than it should be.

So the safer, more performant way to query for a list of IDs if you don't already know that they are unique is to first ensure they are before your query:

models = MyModel.where(id: some_array_of_ids.uniq)