As a junior Rails developer, I’ve been analyzing some of the anti-patterns I and
other junior developers have fallen into. One of the most common has been the
.all in ActiveRecord, particularly when counting database records.
Counting seems easy, at least for small collections, and vanilla ruby provides multiple aliases to do it:
irb(main):001:0> [1,2,3].count => 3 irb(main):002:0> [1,2,3].size => 3 irb(main):003:0> [1,2,3].length => 3
The problem with naively transferring this methodology to Rail is that ActiveRecord collections are backed by databases which, for large applications, can be unreasonably large to treat as simple arrays and hashes.
Counting Rails Collections, the Wrong Way
So when going from Ruby to Rails, a newcomer might say “well, lets grab all the records, and issue any of those three methods to get the count.
irb(main):001:0> posts = Post.all Post Load (0.5ms) SELECT "post".* FROM "posts" => #<ActiveRecord::Relation [#<Post id: 1, user_id: 2, text: "Hello there", created_at: "2015-11-10 23:45:59", updated_at: "2015-11-10 23:45:59">, ...] irb(main):002:0> posts.count 10 irb(main):003:0> posts.length 10
Notice the SQL statement that gets called.
SELECT "post".* FROM "posts"
This is asking the database to return the entire table! For a small application in is early stages, it might not seam like much (the example application I pulled this from has only ten records at the moment), but imagine how high the load on the database, application server, and network would be if every time you wanted a count of items.
A Better Approach and Rails Optimizations
If we want to avoid pulling down the entire database just to get a count of items, we can try chaining the operations together:
irb(main):001:0> Post.all.count (0.4ms) SELECT COUNT(*) FROM "posts" => 10 irb(main):002:0> Post.all.size (0.4ms) SELECT COUNT(*) FROM "posts" => 10 irb(main):003:0> Post.all.length Posts Load (0.6ms) SELECT "posts".* FROM "posts" => 10
Notice that when calling
.all.size, Rails automatically
optimizes the database call, to have the database itself return the
However, the same optimization is not made for
SELECT COUNT(*) FROM "posts"
The upshot is, dont call
To take things to a further extreme, what happens to the database call when we
.all from the chain?
irb(main):001:0> Post.count (0.4ms) SELECT COUNT(*) FROM "posts" => 10 irb(main):002:0> Post.size NoMethodError: undefined method `size' for #<Class:0x007ffd0375efd8> irb(main):003:0> Post.length NoMethodError: undefined method `length' for #<Class:0x007ffd0375efd8>
.count gives us the same optimized database call, while neither
.length are defined.
Use the SQL
This simple example shows the power of using the Rails console and understanding
the basics of SQL syntax; knowing exactly what your application code is doing
with the database behind-the-scenes is crucial to realizing the best performance
from the database. As you build more complicated database queries, it’s
important—especially until you’ve gained expertise in Rails—to test them out
on the console. More complicated queries should probably make use of named
scopes or custom SQL queries using