ActiveRecord migrations are a killer feature of Ruby on Rails.
The feature is very well-implemented, it’s easy to use, and countless teams have benefited from it. Before Rails, most teams I encountered were in the habit of making incredibly error-prone ad hoc changes to each of their application’s databases. It’s one part of the framework I wish were emulated much more broadly than it has been.
While the Rails Guides do a great job teaching developers how to take advantage of migrations, there is little guidance on the habits needed to keep an application’s migrations healthy over the long-term. This post will outline a few things I look for in well-maintained Rails projects.
Keep in mind that for most “application code”, it’s enough that we strive to write “maintainable” code. We don’t have to get everything right today, because we can always improve it tomorrow and redeploy. But once migrations have been initially pushed and deployed, they aren’t supposed to ever change again. That means “maintainability” per se isn’t achievable, which places the onus on the developer to get migrations right up front. (That conclusion can be hard for teams to swallow, since it runs counter to the prevailing anti-future-proof winds of Agile Best Practice™.)
A few habits of healthy db/migrate directories follow below.
Habit 1: Keep your migrations working
At any point in the life of a Rails application, it ought to be possible for a developer to run all of its migrations from a clean database. To be clear:
$ rake db:drop db:create db:migrate
Ought to always succeed.
Even though rake db:schema:load
should have the same effect as running all your migrations, its success will hinge on the accuracy of your project’s schema.rb
, which in turn is generated by running all of the project’s migrations. Developers can only have confidence in their schema.rb
file if they’re able to regenerate it exactly by re-running the project’s migrations from scratch. If a project reaches a point where old migrations can no longer be run, the team is left to trust the veracity of a generated file that they can no longer validate.
In fact, if the schema.rb
isn’t generated from scratch now-and-again, tiny errors and divergences will tend to accumulate as new migrations are iterated upon. To illustrate: a developer may very well write a migration, run rake db:migrate
, then change some aspect of migration, before erroneously committing changes to schema.rb
that were generated before the migration was itself finalized.
When this occurs, such a team’s development and test environments will reflect whatever the schema.rb
indicates, whereas their production environment’s schema—itself the product of only the sum of all deployed migrations—may differ in non-trivial and surprising ways.
While as a general rule it’s advisable never to modify a deployed migration, if the alternative is “all our migrations are broken forever”, it’s worth finding a minimally invasive fix.
Habit 2: Run the migrations often
Regularly running all of your application’s migrations from an empty database provides two immediate benefits. First, any broken migrations will be detected sooner, so they can be fixed more easily. Second, the authoritativeness of your project’s schema.rb
will be regularly validated—if git detects a change in the file after running all migrations, then it’s as easy-to-fix as committing those changes.
This is why I tend to run db:migrate
instead of db:schema:load
in any bootstrap scripts that might be distributed with the project. It might be marginally slower to reinitialize a development database, but that’s pretty easily outweighed by the aforementioned benefits.
Habit 3: Habitually “redo” migrations
Most migrations should be reversible. If a column is added in a migration, rolling back that migration should remove it. Every time I add a new migration and see it succeed, I always make sure its down
migration works too. To do this right after successfully applying a migration, just run:
$ rake db:migrate:redo
Which will revert the most recent migration and then reapply it. This will reveal any problems with the migration’s down
directive. And, if no problems appear, then the database is up-to-date and you can go about your business.
This has become even more important since Rails 3 introduced the otherwise nifty change
hook, because some operations will succeed while migrating forward and only fail when reversed. (Rails 4 corrected a number of these cases, but errors can still crop up.)
Habit 4: Don’t reference models
Suppose your application has a User
model and several related migrations. Your:
User
ActiveRecord model assumes the “users” table is up-to-date20120204...change_users
migration assumes the “users” table is exactly as it was on February 4th, 2012
There’s a glaring impedance mismatch here, if you think about it. When a migration is run, it’s always in a context where the database schema is out-of-date. When an ActiveRecord model is loaded, however, it’s always in a context where the database schema is up-to-date.
ActiveRecord models inherit lots of their behavior by interrogating the state of the database schema when they’re first loaded. If, at model-load-time, the schema is out of alignment with what your model’s internal code expects (validations, callbacks, etc.), it’s very likely your model will raise errors in the context of a running migration. Therefore, never reference your application’s ActiveRecord models from your migrations.
Why might anyone think to do this? Because frequently, complex changes require migrations to existing data as well as to the schema, and interacting with data is much easier with ActiveRecord’s APIs than it is by way of hand-written SQL updates. (Not to mention that writing complex data migrations in raw SQL is terrifically difficult in comparison to accomplishing most other SQL tasks, yet we adopted an ORM to avoid even those cases.)
If you’ve written migrations that depend on loading your application’s actual ActiveRecord models, take comfort in knowing you’re not alone, because Rails developers seem to inadvertently do this all the time. At least some of the blame lies with Rails itself for using the same load strategy when running migrations as it uses when loading the entire application. (If it were up to me, I’d make everything under app/
off-limits to migrations.)
And while I’m perfectly content just dictating that one shouldn’t do this, it may help to have a longform illustration at hand as to how referencing models from migrations can come back to bite you.
An illustration
Suppose you start a User
class:
class User < ActiveRecord::Base
end
And with it, a simple migration:
class CreateUsers < ActiveRecord::Migration
def change
create_table :users do |t|
t.string :name
end
end
end
Later, you might decide to split the user’s name up into a first and last name. You could accomplish this with another migration, this one including a data migration:
class SplitUserNameFields < ActiveRecord::Migration
def up
add_column :users, :first_name, :string
add_column :users, :last_name, :string
User.find_each do |u|
u.update!(
:first_name => u.name.split(" ").first,
:last_name => u.name.split(" ").last,
)
end
remove_column :users, :name
end
def down
add_column :users, :name, :string
User.find_each do |u|
u.update!(
:name => "#{u.first_name} #{u.last_name}"
)
end
remove_column :users, :first_name
remove_column :users, :last_name
end
end
Later on, you might decide to change your model in some way that makes loading the class, querying for instances, or saving changes impossible from the perspective of an out-of-date schema. An easy example is to add the acts_as_paranoid
gem, which adds logical deletion to models, like so:
class User < ActiveRecord::Base
acts_as_paranoid
end
This requires the addition of a deleted_at
column:
class AddDeletedAtToUsers < ActiveRecord::Migration
def change
add_column :users, :deleted_at, :time
end
end
Running rake db:migrate
and rake db:migrate:redo
will work fine at this point.
However, we’ve inadvertently broken our old migration! If we were to run rake db:drop db:create db:migrate
, our data migration would fail because the acts_as_paranoid
gem will preclude User
from being loaded when a deleted_at
column doesn’t exist. Whoops!
Luckily, there are safer ways of leveraging ActiveRecord’s APIs without loading our models under app/
!
Using ActiveRecord models safely
We could fix the now-broken data migration in the previous illustration by defining a new ActiveRecord::Base subclass that’s designed to only be used for the purpose of the migration. By updating it to:
class SplitUserNameFields < ActiveRecord::Migration
class MigrationUser < ActiveRecord::Base
self.table_name = :users
end
def up
add_column :users, :first_name, :string
add_column :users, :last_name, :string
MigrationUser.find_each do |u|
u.update!(
:first_name => u.name.split(" ").first,
:last_name => u.name.split(" ").last,
)
end
remove_column :users, :name
end
#...
end
A clean run of all our migrations will once again succeed! In fact, if our data migration required logic involving associations, there’s nothing preventing us from defining them as well with configuration like has_many :pets, :class_name => "SplitUserNameFields::Pet"
.
Once you’ve established these idioms in your project, it’s easy to carry them forward. Now, you can define all the data migrations you like without any risk that future changes to your application code could someday prevent your old migrations from working.