So let's say that you have a shiny new Ruby program that you wrote. But how will you ensure every environment has the right version of Ruby installed and every user knows how to install all of its dependencies? Before you answer that yourself, ask the lead maintainer of Homebrew how much fun he's having.
I think there is a simpler method to this madness. This is a tutorial to automatically build and push container images for your project, which will save you time and minimize tedious tasks. Manual process is for people that don't yet realize how much time you can save yourself by using DevOps principles to automate this tedious work. The pattern in this post will work for a variety of languages, but you'll have to apply changes in a language-specific manner to be compatible.
Author's Note I think that the term "DevOps" gets used incorrectly in a lot of situations today. In my opinion, this tutorial is the most basic application of the term. I am applying automation to the code since I am familiar with both deployment strategies and application code. One foot in operations, the other in development. This was the original inspiration for this blog post, since I have spoken about automating things at internal Test Double events and Justin had not yet built any sort of workflow to perform these tasks before.
Today we will be modifying a side-project of our very own Justin Searls called feed2gram. I will be able to point out specific commits, too. This isn't a hypothetical exercise, it's what we actually implemented so a Docker container on his Synology could continuously to cross-post photos from his blog to his Instagram account.
In the beginning...
First, you run tests. The application has tests, right? Of course! So this entire process only runs if the tests pass.
# the default task runs rake test standard
bundle exec rake
[Justin's note: I actually didn't write any tests for this gem. Don't tell Andrew.]
Ensure that tests run in Github Actions
This is our first step to continuous integration. We will create a file .github/workflows/main.yml
. The exact filename isn't important so long as it's contained in .github/workflows
.
# the name can be anything you want, but try not to change it once you push it to Github
name: Ruby
on:
# this runs on each push...
push:
# ...to this list of branches, which is just main
branches:
- main
# also runs on each pull request
pull_request:
jobs:
# this is the "pipeline" that does the business
build:
# this is the name of the kind of runner that Github hosts
runs-on: ubuntu-latest
# each job has a name, too. the bits next to "matrix" means "run one job for each combination of things in the strategy.matrix structure"
name: Ruby ${{ matrix.ruby }}
strategy:
matrix:
ruby:
# we are only testing Ruby 3.2.2, but it could be easily changed to support more versions
- '3.2.2'
steps:
# you have to check out the code first, this is almost always the first step in a GHA job
- uses: actions/checkout@v3
# this installs Ruby, since the base runner does not have Ruby
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
# magic sauce for the matrix jobs to substitute the values in the combinations
ruby-version: ${{ matrix.ruby }}
# use a cache. it saves a ton of time.
bundler-cache: true
# this runs the tests
- name: Run the default task
run: bundle exec rake
Add a Dockerfile
Docker containers are ubiquitous in the self-hosted world. It is one of the most common methods of distributing software for self-hosting. Since feed2gram
is all about POSSE, self-hosting is going to be our primary design objective.
To get started with building a docker
container, you need a Dockerfile
:
# we make an image from the default Ruby version tag for our version of Ruby
FROM ruby:3.2.2
# this directory comes from the base image and is where we will deploy the app
# it is common practice to use a simple and non-FHS directory in the root
WORKDIR /srv
# important! we do this specifically for caching. if these files do not change,
# then later builds will re-use the cache! it can sometimes take minutes to
# install dependencies, so this is an excellent optimization
COPY Gemfile Gemfile.lock feed2gram.gemspec .
# this app requires this file to calculate the version
COPY lib/feed2gram/version.rb lib/feed2gram/
# install dependencies
# so long as the files in the prior COPY statements do not change, then this
# step will be cached on subsequent builds and save potentially _minutes_
RUN bundle install
# just copy everything, even the files we already copied
ADD . .
# this makes every invocation of the container work like a command-line tool
# any arguments to `docker run` will be passed to the application just
# like you would on the command line
ENTRYPOINT ["/srv/exe/feed2gram"]
We have an excellent guide that deep-dives into optimizing Docker layer caching that is written by our own Mavrick Laakso.
Using the image
Let's take a small detour and actually use this setup now.
git clone https://github.com/searls/feed2gram
cd feed2gram
docker build -t feed2gram .
# wait for the build to finish
docker run --rm -it \ # --rm removes and -it allows it to work like a CLI tool
-v my-config.yml:/srv/config.yml \ # this bind-mounts the config-you-have-made into the container
feed2gram \ # name of the docker image
--config /srv/config.yml # arguments passed to the entrypoint
Assuming you followed the directions in the README.md to get all of the tokens ready, then you should see something work.
Building the container automatically
Now we'll modify our workflow to build the container. Because the new job requires the build
job to succeed, tests must pass to build the container! While not a perfect method, it will minimize the possibility of releasing buggy software to our end users.
name: Ruby
on:
push:
branches:
- main
pull_request:
jobs:
build:
runs-on: ubuntu-latest
name: Ruby ${{ matrix.ruby }}
strategy:
matrix:
ruby:
- '3.2.2'
steps:
- uses: actions/checkout@v3
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: ${{ matrix.ruby }}
bundler-cache: true
- name: Run the default task
run: bundle exec rake
# 👇New job is here!
docker:
runs-on: ubuntu-latest
name: Build Docker Container
# this job will only run if all of the "needs" are successful
# hence, tests must pass!
needs: [build]
steps:
- uses: actions/checkout@v4
# we install qemu to build arm images
- uses: docker/setup-qemu-action@v3
# configure buildkit
- uses: docker/setup-buildx-action@v3
# log into github container registry if not a pull request
- uses: docker/login-action@v3
if: github.event_name != 'pull_request'
with:
registry: ghcr.io
username: searls
# 👇Github container registry requires a personal access token
# Don't commit this to code, ever.
password: ${{ secrets.CH_PAT }}
# this does all the work for us
- uses: docker/build-push-action@v5
with:
# 👇Important! you want to use a cache to save CI minutes
cache-from: type=gha
# mode=max means "also cache intermediate layers and not just the resulting image"
# which is the part that makes the prior optimizations worth while
cache-to: type=gha,mode=max
context: .
# build for both amd64 and arm64, for the mac/raspberrypi folks!
platforms: linux/amd64,linux/arm64
# 👻 we don't push the resulting image if this is a pull request
push: ${{ github.event_name != 'pull_request' }}
And there you have it. A fully functional workflow that will automatically build and push container images for your project on each push to the default branch. You could stop here... but we're going to keep going. There are some rough edges and small improvements that will still carry us further toward our goal.
Perpetual processing
If you looked at how we ran the resulting container previously, it is a one-shot execution. So every time you want to sync your posts, you'll have to run the program again. This is kind of tedious and an anti-pattern to our "automate all the things", so let's investigate how to run this program continuously.
There are a few possible choices:
- Add the
daemon
gem and implement astart
/stop
/restart
workflow with Ruby. - Run
busybox cron
and then write acrontab
file. - Add a very small shell script that just runs a loop.
We're going to go with the last of those here. Let's add another file called bin/daemon
that will accomplish this task for us.
#!/usr/bin/env bash
# ☝️ this is the preferred method of running bash.
# not everyone has /bin/bash
# and not every /bin/bash is the same (macos 🤢)
echo "Starting feed2gram daemon..."
# run the app once
# the "$@" will pass all of the arguments given to the container to the
# original script, so it's 💯 compatible with the old implementation!
exe/feed2gram --verbose "$@"
# sleep by default 60 seconds
# or define env var SLEEP_TIME=XXX to pick a different interval
while sleep ${SLEEP_TIME:-60} ; do
# run it again, with a timestamp output
echo "[$(date -R)] Re-running feed2gram..."
exe/feed2gram --verbose "$@"
done
This solution happily bypasses a lot of gotchas. If we used cron
, then you would have to write a crontab
file and not even system administrators want to have to remember the format of those files. Also, when you use cron
, you would have to know that the previous invocation had finished before you start the next one. You really don't want to run the script twice and have them do the same thing at the same time. The shell script is less code than adding the gem and wrapper, plus you can still use the container as a one-shot if you want!
Using the new image
docker run --rm -it
-e SLEEP_TIME=600 \ # 🛏️ this is how you change the sleep interval
-v my-config.yml:/srv/my-config.yml \
feed2gram \
--config /srv/my-config.yml
🥳 now you have a long-running process that will continually monitor your feed and syndicate content to Instagram.
How to get the old behavior
You just change the entrypoint back to the old one!
docker run --rm -it \
--entrypoint /srv/exe/feed2gram \ # 👈 this is all you need
-v my-config.yml:/srv/my-config.yml \
feed2gram \
--config /srv/my-config.yml
Container image optimization
By now we have a really convenient image for running our new fancy application. 🤠
andrew@potassium:~/feed2gram$ docker images feed2gram
REPOSITORY TAG IMAGE ID CREATED SIZE
feed2gram latest 881fe940ab29 45 seconds ago 1.04GB
Oh. Okay. That's ... really big for a little application like this. We can use dive to examine each layer, but maybe it's not our fault that the resulting image is so big.
andrew@potassium:~/feed2gram$ docker images ruby
REPOSITORY TAG IMAGE ID CREATED SIZE
ruby 3.2.2 e1ebac6c7119 6 days ago 988MB
Right. Looks like our application is taking up around 75 MB, so that's more in line with what I expected. So let's find a different base image. While we are at it, let's add a VOLUME
into the Dockerfile
for the configuration files. This will give the end users a clear place to mount their configuration files and be an obvious location for persistent data.
FROM ruby:3.2.2-alpine
# 👆The alpine image is much more spartan,
# but it's more than enough for our little application
WORKDIR /srv
COPY Gemfile Gemfile.lock feed2gram.gemspec ./
COPY lib/feed2gram/version.rb lib/feed2gram/
# 👇 the dependencies are different in alpine
RUN apk update && \
apk add autoconf bash git gcc make musl-dev && \
bundle install && \
apk del --purge --rdepends git gcc autoconf make musl-dev
ADD . .
# 👇 this is where persistent data lives
VOLUME /config
# 👇 the volume subtly changes the default arguments
CMD ["--config", "/config/feed2gram.yml"]
ENTRYPOINT ["/srv/bin/daemon"]
The end result
andrew@potassium:~/feed2gram$ docker images feed2gram
REPOSITORY TAG IMAGE ID CREATED SIZE
feed2gram latest dc054fbec1a7 5 seconds ago 132MB
That's much better. I'm sure all of the self-hosters will appreciate the smaller download. There is one more step to go before we call it done. If we want to pull the containers, we have to know the git SHA of the revision. This is kind of irritating to the end user to have to go and cross-reference what is in the repository and it will be a pain to remove old versions of the image since you can't sort a SHA.
Image tags
There are a lot of conventions around version numbers and docker containers. We'll use a (new to me) action called docker/metadata-action that will generate any combination of tags for the resulting build. The action will also generate labels that will help Github associate the image with the source repository.
- On each build, it will tag with the
git
commit SHA - On a PR, it will generate a tag like
pr-2
- On the default (
main
) branch it will tag withlatest
- For a tag like
v1.2.3
it will tag with1.2.3
- For a tag like
v2.0.0
it will tag with2
- For early development tags like
v0.0.4
, it will not tag with0
name: Ruby
on:
push:
branches:
- main
# 👇 we run this workflow on tags that start with a "v"
tags:
- 'v*'
pull_request:
jobs:
build:
runs-on: ubuntu-latest
name: Ruby ${{ matrix.ruby }}
strategy:
matrix:
ruby:
- '3.2.2'
steps:
- uses: actions/checkout@v3
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: ${{ matrix.ruby }}
bundler-cache: true
- name: Run the default task
run: bundle exec rake
docker:
runs-on: ubuntu-latest
name: Build Docker Container
needs: [build]
steps:
- uses: actions/checkout@v4
- uses: docker/setup-qemu-action@v3
- uses: docker/setup-buildx-action@v3
# 👇 here is the new action
- uses: docker/metadata-action@v5
id: metadata
with:
images: |
ghcr.io/searls/feed2gram
tags: |
type=raw,value=latest,enable={{is_default_branch}}
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}},enable=${{ !startsWith(github.ref, 'refs/tags/v0.') }}
type=sha,prefix=,format=long
- uses: docker/login-action@v3
if: github.event_name != 'pull_request'
with:
registry: ghcr.io
username: searls
password: ${{ secrets.CH_PAT }}
- uses: docker/build-push-action@v5
with:
cache-from: type=gha
cache-to: type=gha,mode=max
context: .
platforms: linux/amd64,linux/arm64
push: ${{ github.event_name != 'pull_request' }}
# 👇 here we use the outputs to magic all of this work away
tags: ${{ steps.metadata.outputs.tags }}
labels: ${{ steps.metadata.outputs.labels }}
Conclusion
It seems like a ton of work, but this gets easier the more you are exposed to it. The patterns laid out here are generally applicable to a lot of different projects, but with some minor modifications. This work is a natural process for DevOps development. Take a nice thing and then automate it. Try to use it and then shave off the rough spots. Iterate and improve. At all steps in the path you need to evaluate what would benefit your end-users and optimize for those features.
- We added a
Dockerfile
in 6201ff5. - We added a job to build a multi-arch container when tests pass in 66cbbfb.
- We added a tiny shell script to perpetually run the program in 9b0970c.
- We minimized the resulting size of the container image in 12f582a.
- We added conventional tags to the container image in 3fc2298.
To celebrate our automation, feed2gram
has been officially blessed as version 1.0! 🎉 🥳