Efficiently Counting Records in Rails


Why We Don't Use all.length

Posted on Saturday, November 14, 2015 by Charles Beynon
× Charles Beynon
[Photo: Dark-haired caucasian male, mid-thirties, wearing dark-rimmed eyeglasses]

Full Stack Developer mostly working in Rails. Recovering sysadmin. I like to dabble and tend to ramble. Don't say I didn't warn you ;D

About me

As a junior Rails developer, I’ve been analyzing some of the anti-patterns I and other junior developers have fallen into. One of the most common has been the misuse of .all in ActiveRecord, particularly when counting database records.

Counting seems easy, at least for small collections, and vanilla ruby provides multiple aliases to do it:

irb(main):001:0> [1,2,3].count
 => 3
irb(main):002:0> [1,2,3].size
 => 3
irb(main):003:0> [1,2,3].length
 => 3

The problem with naively transferring this methodology to Rail is that ActiveRecord collections are backed by databases which, for large applications, can be unreasonably large to treat as simple arrays and hashes.

Counting Rails Collections, the Wrong Way

So when going from Ruby to Rails, a newcomer might say “well, lets grab all the records, and issue any of those three methods to get the count.

irb(main):001:0> posts = Post.all
Post Load (0.5ms)  SELECT "post".* FROM "posts"
 => #<ActiveRecord::Relation [#<Post id: 1, user_id: 2, text: "Hello there",
created_at: "2015-11-10 23:45:59", updated_at: "2015-11-10 23:45:59">, ...]
irb(main):002:0> posts.count
10
irb(main):003:0> posts.length
10

Notice the SQL statement that gets called.

SELECT "post".* FROM "posts"

This is asking the database to return the entire table! For a small application in is early stages, it might not seam like much (the example application I pulled this from has only ten records at the moment), but imagine how high the load on the database, application server, and network would be if every time you wanted a count of items.

A Better Approach and Rails Optimizations

If we want to avoid pulling down the entire database just to get a count of items, we can try chaining the operations together:

irb(main):001:0> Post.all.count
  (0.4ms)  SELECT COUNT(*) FROM "posts"
 => 10
irb(main):002:0> Post.all.size
  (0.4ms)  SELECT COUNT(*) FROM "posts"
 => 10
irb(main):003:0> Post.all.length
  Posts Load (0.6ms)  SELECT "posts".* FROM "posts"
 => 10

Notice that when calling .all.count and .all.size, Rails automatically optimizes the database call, to have the database itself return the COUNT. However, the same optimization is not made for .all.length.

SELECT COUNT(*) FROM "posts"

The upshot is, dont call .all.length!

To take things to a further extreme, what happens to the database call when we omit .all from the chain?

irb(main):001:0> Post.count
  (0.4ms)  SELECT COUNT(*) FROM "posts"
 => 10
irb(main):002:0> Post.size
NoMethodError: undefined method `size' for #<Class:0x007ffd0375efd8>
irb(main):003:0> Post.length
NoMethodError: undefined method `length' for #<Class:0x007ffd0375efd8>

So calling .count gives us the same optimized database call, while neither .size nor .length are defined.

Use the SQL

This simple example shows the power of using the Rails console and understanding the basics of SQL syntax; knowing exactly what your application code is doing with the database behind-the-scenes is crucial to realizing the best performance from the database. As you build more complicated database queries, it’s important—especially until you’ve gained expertise in Rails—to test them out on the console. More complicated queries should probably make use of named scopes or custom SQL queries using .where


Comments