Using Ansible and Docker for home servers
I run two Raspberry Pi servers on my home network: one connected to hard drives to provide network-attached storage, and another to monitor services, energy, and water usage. In setting up the servers, I wanted to keep all of the configuration under version control so I could easily roll-back mistakes. Repeatability was also important in case I need to swap out one of the servers due to hardware failure. Finally, I wanted a system that was easy to pick up, as I tend to rotate between interests and might come back to the servers with no memory of how they work.
I ultimately11 I tried and failed at using Nix for this. I could not get the SD card flasher to cross-compile a ZFS-capable Linux kernel from a Fedora laptop in any reasonable amount of time, so I gave up. It was glorious when it was working though and I hope to come back to it someday. chose Ansible for running tasks on each server, bringing in Docker for any long-running services. For instance I install ZFS on the storage server with Ansible, but run a service that monitors SMART statistics using Docker. While the two somewhat-competing systems add a bit more cognitive overhead to the setup, there are significant benefits from using both.
Rationale
Ansible isn't perfect: the use of YAML for everything is a choice and its tasks aren't idempotent in practice. It's also not a configuration language -- it's a task runner -- so changes may need to be rolled back manually if they start up a service or install a package. The work people have done with it however has a strong sense of pragmatism and I usually don't need to care what language it's written in.22 I write (and review) enough Python at my day job, thank you very much. There are a lot of overlapping features so you need to use the correct subset of them to avoid pitfalls, like many organically-grown software packages.
Additionally, Docker paired with docker-compose files has enough mindshare and momentum that most services come with some kind of Dockerfile and possibly an image pre-built on Docker Hub or GitHub Container Registry. I've found it better to use containers from the creators of the service than to rely on third-party volunteer efforts for packaging software.
The "Docker-native" approach to orchestration would be using Kubernetes and Helm, but I've read that they aren't designed for a heterogenous network with long-term storage like I have.33 As I add more one-off services to my system that do small tasks, I may add k3s to my setup. There are ways to make it work, but it feels more appropriate for a cluster environment, where each node is interchangeable. I'm trying to go with the grain for the system integration here; I've spent enough of my youth trying to make software do things they weren't designed to do.
Ansible
That's the rationale for these choices, but how do you actually use Ansible to set up servers? The Ansible documentation has a lot of extra details that don't matter and most of it is reference-style anyway. Actually solving problems with these systems is challenging thanks to the degrading quality of search results and SEO blog spam that dominates buzzword-heavy technologies.44 I would be amazed if this note ranks anywhere in the top-dozen pages of results, unfortunately. You need some familiarity with YAML for the following descriptions to make sense, and apologies if this is your first exposure to the language.
Ansible needs to know what servers you want to work with, using an "inventory" file.
An inventory file named inventory.yml could have the following contents:
group-a:
hosts:
host-a:
ansible_host: 192.168.1.10
group-b:
hosts:
host-b:
ansible_host: 192.168.1.11
group-c:
children:
group-a:
group-b:
This sets up a hierarchy to group servers together, but I didn't end up needing to use that for my simple server setup.
You must provide the inventory to commands like ansible-playbook with either the --inventory option or by adding inventory = <file> to a file named ansible.cfg in the same directory that Ansible is running from.
Playbooks, tasks, and roles
Ansible runs "playbooks", which are YAML files that describe actions to take on groups or hosts in your inventory using the ansible-playbook command.
There's no real naming scheme for them, but I chose to name my "set everything up and hope it's idempotent" one main.yml.
Mine looks sort of like this:
- hosts: all
name: Updating apt Cache
tasks:
- name: Updating apt Cache
become: true
ansible.builtin.apt:
update_cache: true
cache_valid_time: 3600
when:
- ansible_facts.os_family == "Debian"
- hosts: host-a
name: Setting up host-a
roles:
- role: docker
- role: pihole
Each item in the top-level array is a set of sequential tasks or roles to apply to hosts.
A task is the atomic unit of change in Ansible: it could be a command to run, an apt package to install, or a directory to synchronize with the host.
There are a lot of built-in templates for tasks, but it's apparently possible to write your own template.
I never had to do that, though, and it sounds like it could involve Python, which I'm trying to avoid here.
They have the following structure:
- name: <some-human-readable-description-of-whats-happening>
when: <a-condition-for-whether-the-task-should-run>
<template-name>:
<template-variable>: <value>
There are other possibilities, like become (to run the task as root), but those are the main ones at the top-level task structure.
The when field is there because tasks are supposed to be idempotent, that is, running them multiple times doesn't duplicate an effect.
Running a task when its end result is already in place should do nothing.
This is really hard to achieve in practice but is a laudable goal.
Some of the naming schemes are odd because of this, too.
I don't use when very much but I probably should go back and do another pass on my tasks to add it.
I only used half a dozen task templates (Ansible calls these "modules") to set up the servers:
ansible.posix.synchronize:rsync, copy a directory fromsrchierarchy to the hostansible.builtin.file:mkdir -p, create a file or directory on the host, which is necessary because most destinations in other tasks assume the parent directory is already present, frustratinglyansible.builtin.service:sysctemctl, restart servicesansible.builtin.template: scp but with variables, take a Jinja2 template, interpolate any variables, and write it to the host, useful for docker-compose YAML or configuration filescommunity.docker.docker_compose: docker compose up, manage Docker containers using a docker-compose fileansible.builtin.user: add users or groupsansible.builtin.command: sh, the escape hatch to run arbitrary commands, best used sparingly
While it's perfectly fine for playbooks to be just straight lists of tasks, and I started out doing that, at some point it's easier to maintain when tasks can be split into multiple files.55 For an example of how this can get out of hand, see the internet-pi project, which shares many of the goals of my setup.
The repository has a top-level tasks directory that the main.yml playbook references using ansible.builtin.import_tasks.
And every service's template file is in the templates directory.
I find it a lot nicer to colocate the same service's files under a single directory.
There's a ansible.builtin.include_tasks template which I was using to great effect, but eventually my templates were spread across different directories and it got hard to manage.
That's where "roles" come in: they package up related tasks and resources to achieve some result (like "get Grafana running"). My Grafana goal has the following directory structure:
ansible/roles/grafana/tasks/main.yml
ansible/roles/grafana/files/dashboards/services.json
ansible/roles/grafana/templates/docker-compose.yml.j2
The templates/docker-compose.yml.j2 file is a template that can be referenced in the src field by just its filename in a ansible.builtin.template module.
tasks/main.yml holds the sequence of tasks that should be carried out by hosts that adopt this role.
And files/* is a list of related files that can be copied to the host of referenced in tasks.
All of the configuration and tasks that reference them are under the same directory.
Variables are passed to roles from the playbook like this:
- hosts: host-a
name: Setting up host-a
roles:
- role: pihole
vars:
pihole_port: 12345
Dealing with secrets
Ansible ships with a system called Vault that can decrypt secrets that are stored as cipher text in variables.
There's a single password for each vault that's used to encrypt other secrets.
Otherwise there's no setup for a "vault", because it only needs a password.
The ansible-vault command reads the secret on stdin(4) and prints out the string to use as the variable.
Here's an example:
ansible-vault encrypt_string --ask-vault-pass --name <name-of-secret>
This produces a string starting with !vault | followed by the encrypted secret.
Ansible will automatically decrypt it and substitute it in playbooks when they're run.
I'm not sure the --name argument does anything, and it's a little annoying to need to type in the vault password in each time.
Ansible even allows the vault password to be supplied by another program.
In my case I use 1Password to hold all of my secrets, so I added a script that just runs:
exec op item get 'Ansible Vault' --fields password
This pulls the password out of an entry called "Ansible Vault".
By setting the script as the vault_password_file in ansible.cfg where I run Ansible, I don't have to type my vault password in each time I want to run a playbook or encrypt a secret.
Putting it all together in my primary playbook, the roles typically look like this:
- hosts: my-pi
name: Serving DNS
roles:
- role: pihole
vars:
pihole_password: !vault |
$ANSIBLE_VAULT;1.1;AES256
<cipher-text>
This makes the pihole_password secret available to use in the role.
It's also possible to store these in a group_vars or host_vars directory at the top-level, but I didn't feel that was necessary.
Ansible tips
If you're using a recent version of macOS and see a weird crash when Python is forking, add
env OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YESin front of the command. This disables an Objective-C runtime feature that tries to prevent processes from forking after they've used certain language features from multiple threads. Ansible uses PyObjC alongside Python's various concurrency schemes, which can trigger this diagnostic.Use
--start-at-task <name-of-task>to skip all tasks leading up to the named task, allowing you to troubleshoot a tricky section of a playbook or role.Add
--limit-to <inventory-name>to only run tasks on a given host or inventory group from a playbook. Any tasks that apply to other hosts are ignored.The
--stepoption causes each task to ask for confirmation before running, which lets you cut off a playbook's execution early.Set
bin_ansible_callbacks = Trueandcallbacks_enabled = profile_tasks,profile_rolesin youransible.cfgto produce timing information about each task and a summary of the longest tasks and roles. It's a bit verbose, but nice to know how long things are taking.If you're as horrified as I was at how slow task execution is, I have the following in my
ansible.cfgin a (vain) attempt to make Ansible a bit faster:# Don't wait for all hosts to finish before moving to the next task. strategy = free nocows = True interpreter_python = auto_silent gathering = smart fact_caching = jsonfile fact_caching_connection = /tmp/ansible_cache [ssh_connection] pipelining = True ssh_args = -o ControlMaster=auto -o ControlPersist=60sI don't know what half of them do, but they came highly recommended.
Use
ansible-linton your YAML files to flag any common pitfalls. I still have hundreds of warnings throughout my roles, and some of them seem sensible.While it's possible to specify "builtin" modules without the
ansible.builtin.prefix, doing so discouraged: always use the prefixed version.
Docker
Most Docker guides describe the command line interface to running containers, but I exclusively used docker-compose. I'd much rather have the settings I apply to each container stored in a file than in my shell history. The compose file format is straightforward and is usually meant to bundle multiple Docker containers together in the same file, even though I typically specify a single service per file. You typically only need to do a few things with a container:
Expose a port from the container to the host: use the
portssection with<outside-port>:<inside-port>entries.Mount volumes from the host in the container: use the
volumessection with a similar<outside-path>:<inside-mount-point>[:ro]with the optional:rosuffix to make it read-only to the container.Connect the container to other container's networks: add a
networkssection with a list of the named Docker networks to join. You might want to create these in Ansible, but I had a single Docker compose file "create" them as non-external networks.Set environment variables on the services: add an
environmentsection with entries for each variable.
It's probably best to just look at the most complex compose file I used in my home server:
{{ ansible_managed | comment }}
version: "3.5"
volumes:
vmagentdata: {}
vmdata: {}
networks:
vm_net:
name: vm_net
services:
vmagent:
container_name: vmagent
image: victoriametrics/vmagent:v1.80.0
depends_on:
- "<other-service>"
ports:
- "{{ prometheus_port }}:8429
extra_hosts:
- "<this-host-name>:host-gateway"
- "<other-host-name>:<ip>"
volumes:
- vmagentdata:/vmagentdata
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- ./file_sd:/etc/prometheus/file_sd
command:
- "--promscrape.config=/etc/prometheus/prometheus.yml"
- "--remoteWrite.url=http://victoriametrics:8428/api/v1/write"
networks:
- vm_net
restart: always
This is stored as docker-compose.yml.j2 in the templates directory of a VictoriaMetrics role.
There's a second service below this to actually run the metrics database: this snippet is just the Prometheus scraper.
When the ansible.builtin.template module puts this on the device, it replaces {{ prometheus_port }} with a variable value I've defined in my playbook.
Here are the Ansible tasks that do that:
- name: Copying VictoriaMetrics Docker Compose
ansible.builtin.template:
src: docker-compose.yml.j2
dest: ~/victoriametrics/docker-compose.yml
become: false
- name: Starting VictoriaMetrics
community.docker.docker_compose:
project_src: "~/victoriametrics/"
build: false
restarted: true
become: false
I'm not sure if using ~/victoriametrics as the main directory for the container is a good idea, but it seems to be working for now.
Docker troubleshooting
docker pslists all the containers that have been created and whether they're running. If a container is in a restarting state, there's likely something wrong with it that's preventing it from starting up.docker logs <container>prints the log messages for just the processes running in that container.docker exec -it <container> <command>runs a command in the context of the container. Most containers don't have many commands available (frustratingly,curlis usually replaced bywget), but this can help inspect the file system and can even run a shell if/bin/shis passed as the command to run.
Summary
Hopefully this note helped you put together your own home server in a way that's easy to manage. I struggled for a long time with which configuration system to use, and eventually just did the simplest thing I could find that still let me check the steps into version control. If you have any thoughts, don't hesitate to reach out and email me at hello@mattwidmann.net.