Using Ansible and Docker for home servers

I run two Raspberry Pi servers on my home network: one connected to hard drives to provide network-attached storage, and another to monitor services, energy, and water usage. In setting up the servers, I wanted to keep all of the configuration under version control so I could easily roll-back mistakes. Repeatability was also important in case I need to swap out one of the servers due to hardware failure. Finally, I wanted a system that was easy to pick up, as I tend to rotate between interests and might come back to the servers with no memory of how they work.

I ultimately¹¹I tried and failed at using Nix for this. I could not get the SD card flasher to cross-compile a ZFS-capable Linux kernel from a Fedora laptop in any reasonable amount of time, so I gave up. It was glorious when it was working though and I hope to come back to it someday. chose Ansible for running tasks on each server, bringing in Docker for any long-running services. For instance I install ZFS on the storage server with Ansible, but run a service that monitors SMART statistics using Docker. While the two somewhat-competing systems add a bit more cognitive overhead to the setup, there are significant benefits from using both.

Rationale

Ansible isn't perfect: the use of YAML for everything is a choice and its tasks aren't idempotent in practice. It's also not a configuration language -- it's a task runner -- so changes may need to be rolled back manually if they start up a service or install a package. The work people have done with it however has a strong sense of pragmatism and I usually don't need to care what language it's written in.²²I write (and review) enough Python at my day job, thank you very much. There are a lot of overlapping features so you need to use the correct subset of them to avoid pitfalls, like many organically-grown software packages.

Additionally, Docker paired with docker-compose files has enough mindshare and momentum that most services come with some kind of Dockerfile and possibly an image pre-built on Docker Hub or GitHub Container Registry. I've found it better to use containers from the creators of the service than to rely on third-party volunteer efforts for packaging software.

The "Docker-native" approach to orchestration would be using Kubernetes and Helm, but I've read that they aren't designed for a heterogenous network with long-term storage like I have.³³As I add more one-off services to my system that do small tasks, I may add k3s to my setup. There are ways to make it work, but it feels more appropriate for a cluster environment, where each node is interchangeable. I'm trying to go with the grain for the system integration here; I've spent enough of my youth trying to make software do things they weren't designed to do.

Ansible

That's the rationale for these choices, but how do you actually use Ansible to set up servers? The Ansible documentation has a lot of extra details that don't matter and most of it is reference-style anyway. Actually solving problems with these systems is challenging thanks to the degrading quality of search results and SEO blog spam that dominates buzzword-heavy technologies.⁴⁴I would be amazed if this note ranks anywhere in the top-dozen pages of results, unfortunately. You need some familiarity with YAML for the following descriptions to make sense, and apologies if this is your first exposure to the language.

Ansible needs to know what servers you want to work with, using an "inventory" file. An inventory file named inventory.yml could have the following contents:

group-a:
  hosts:
    host-a:
      ansible_host: 192.168.1.10

group-b:
  hosts:
    host-b:
      ansible_host: 192.168.1.11

group-c:
  children:
    group-a:
    group-b:

This sets up a hierarchy to group servers together, but I didn't end up needing to use that for my simple server setup. You must provide the inventory to commands like ansible-playbook with either the --inventory option or by adding inventory = <file> to a file named ansible.cfg in the same directory that Ansible is running from.

Playbooks, tasks, and roles

Ansible runs "playbooks", which are YAML files that describe actions to take on groups or hosts in your inventory using the ansible-playbook command. There's no real naming scheme for them, but I chose to name my "set everything up and hope it's idempotent" one main.yml. Mine looks sort of like this:

- hosts: all
  name: Updating apt Cache
  tasks:
    - name: Updating apt Cache
      become: true
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600
      when:
        - ansible_facts.os_family == "Debian"

- hosts: host-a
  name: Setting up host-a
  roles:
    - role: docker
    - role: pihole

Each item in the top-level array is a set of sequential tasks or roles to apply to hosts. A task is the atomic unit of change in Ansible: it could be a command to run, an apt package to install, or a directory to synchronize with the host. There are a lot of built-in templates for tasks, but it's apparently possible to write your own template. I never had to do that, though, and it sounds like it could involve Python, which I'm trying to avoid here. They have the following structure:

- name: <some-human-readable-description-of-whats-happening>
  when: <a-condition-for-whether-the-task-should-run>
  <template-name>:
    <template-variable>: <value>

There are other possibilities, like become (to run the task as root), but those are the main ones at the top-level task structure. The when field is there because tasks are supposed to be idempotent, that is, running them multiple times doesn't duplicate an effect. Running a task when its end result is already in place should do nothing. This is really hard to achieve in practice but is a laudable goal. Some of the naming schemes are odd because of this, too. I don't use when very much but I probably should go back and do another pass on my tasks to add it.

I only used half a dozen task templates (Ansible calls these "modules") to set up the servers:

ansible.posix.synchronize: rsync, copy a directory from src hierarchy to the host
ansible.builtin.file: mkdir -p, create a file or directory on the host, which is necessary because most destinations in other tasks assume the parent directory is already present, frustratingly
ansible.builtin.service: sysctemctl, restart services
ansible.builtin.template: scp but with variables, take a Jinja2 template, interpolate any variables, and write it to the host, useful for docker-compose YAML or configuration files
community.docker.docker_compose: docker compose up, manage Docker containers using a docker-compose file
ansible.builtin.user: add users or groups
ansible.builtin.command: sh, the escape hatch to run arbitrary commands, best used sparingly

While it's perfectly fine for playbooks to be just straight lists of tasks, and I started out doing that, at some point it's easier to maintain when tasks can be split into multiple files.⁵⁵For an example of how this can get out of hand, see the internet-pi project, which shares many of the goals of my setup. The repository has a top-level tasks directory that the main.yml playbook references using ansible.builtin.import_tasks. And every service's template file is in the templates directory. I find it a lot nicer to colocate the same service's files under a single directory. There's a ansible.builtin.include_tasks template which I was using to great effect, but eventually my templates were spread across different directories and it got hard to manage.

That's where "roles" come in: they package up related tasks and resources to achieve some result (like "get Grafana running"). My Grafana goal has the following directory structure:

ansible/roles/grafana/tasks/main.yml
ansible/roles/grafana/files/dashboards/services.json
ansible/roles/grafana/templates/docker-compose.yml.j2

The templates/docker-compose.yml.j2 file is a template that can be referenced in the src field by just its filename in a ansible.builtin.template module. tasks/main.yml holds the sequence of tasks that should be carried out by hosts that adopt this role. And files/* is a list of related files that can be copied to the host of referenced in tasks. All of the configuration and tasks that reference them are under the same directory.

Variables are passed to roles from the playbook like this:

- hosts: host-a
  name: Setting up host-a
  roles:
    - role: pihole
      vars:
        pihole_port: 12345

Dealing with secrets

Ansible ships with a system called Vault that can decrypt secrets that are stored as cipher text in variables. There's a single password for each vault that's used to encrypt other secrets. Otherwise there's no setup for a "vault", because it only needs a password. The ansible-vault command reads the secret on stdin(4) and prints out the string to use as the variable. Here's an example:

ansible-vault encrypt_string --ask-vault-pass --name <name-of-secret>

This produces a string starting with !vault | followed by the encrypted secret. Ansible will automatically decrypt it and substitute it in playbooks when they're run. I'm not sure the --name argument does anything, and it's a little annoying to need to type in the vault password in each time. Ansible even allows the vault password to be supplied by another program. In my case I use 1Password to hold all of my secrets, so I added a script that just runs:

exec op item get 'Ansible Vault' --fields password

This pulls the password out of an entry called "Ansible Vault". By setting the script as the vault_password_file in ansible.cfg where I run Ansible, I don't have to type my vault password in each time I want to run a playbook or encrypt a secret. Putting it all together in my primary playbook, the roles typically look like this:

- hosts: my-pi
  name: Serving DNS
  roles:
    - role: pihole
      vars:
        pihole_password: !vault |
          $ANSIBLE_VAULT;1.1;AES256
          <cipher-text>

This makes the pihole_password secret available to use in the role. It's also possible to store these in a group_vars or host_vars directory at the top-level, but I didn't feel that was necessary.

Ansible tips

If you're using a recent version of macOS and see a weird crash when Python is forking, add env OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES in front of the command. This disables an Objective-C runtime feature that tries to prevent processes from forking after they've used certain language features from multiple threads. Ansible uses PyObjC alongside Python's various concurrency schemes, which can trigger this diagnostic.
Use --start-at-task <name-of-task> to skip all tasks leading up to the named task, allowing you to troubleshoot a tricky section of a playbook or role.
Add --limit-to <inventory-name> to only run tasks on a given host or inventory group from a playbook. Any tasks that apply to other hosts are ignored.
The --step option causes each task to ask for confirmation before running, which lets you cut off a playbook's execution early.
Set bin_ansible_callbacks = True and callbacks_enabled = profile_tasks,profile_roles in your ansible.cfg to produce timing information about each task and a summary of the longest tasks and roles. It's a bit verbose, but nice to know how long things are taking.

If you're as horrified as I was at how slow task execution is, I have the following in my ansible.cfg in a (vain) attempt to make Ansible a bit faster:

    # Don't wait for all hosts to finish before moving to the next task.
    strategy = free
    nocows = True
    interpreter_python = auto_silent
    gathering = smart
    fact_caching = jsonfile
    fact_caching_connection = /tmp/ansible_cache

    [ssh_connection]
    pipelining = True
    ssh_args = -o ControlMaster=auto -o ControlPersist=60s

I don't know what half of them do, but they came highly recommended.

Use ansible-lint on your YAML files to flag any common pitfalls. I still have hundreds of warnings throughout my roles, and some of them seem sensible.
While it's possible to specify "builtin" modules without the ansible.builtin. prefix, doing so discouraged: always use the prefixed version.

Docker

Most Docker guides describe the command line interface to running containers, but I exclusively used docker-compose. I'd much rather have the settings I apply to each container stored in a file than in my shell history. The compose file format is straightforward and is usually meant to bundle multiple Docker containers together in the same file, even though I typically specify a single service per file. You typically only need to do a few things with a container:

Expose a port from the container to the host: use the ports section with <outside-port>:<inside-port> entries.
Mount volumes from the host in the container: use the volumes section with a similar <outside-path>:<inside-mount-point>[:ro] with the optional :ro suffix to make it read-only to the container.
Connect the container to other container's networks: add a networks section with a list of the named Docker networks to join. You might want to create these in Ansible, but I had a single Docker compose file "create" them as non-external networks.
Set environment variables on the services: add an environment section with entries for each variable.

It's probably best to just look at the most complex compose file I used in my home server:

{{ ansible_managed | comment }}

version: "3.5"
volumes:
  vmagentdata: {}
  vmdata: {}
networks:
  vm_net:
    name: vm_net
services:
  vmagent:
    container_name: vmagent
    image: victoriametrics/vmagent:v1.80.0
    depends_on:
      - "<other-service>"
    ports:
      - "{{ prometheus_port }}:8429
    extra_hosts:
      - "<this-host-name>:host-gateway"
      - "<other-host-name>:<ip>"
    volumes:
      - vmagentdata:/vmagentdata
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - ./file_sd:/etc/prometheus/file_sd
    command:
      - "--promscrape.config=/etc/prometheus/prometheus.yml"
      - "--remoteWrite.url=http://victoriametrics:8428/api/v1/write"
    networks:
      - vm_net
    restart: always

This is stored as docker-compose.yml.j2 in the templates directory of a VictoriaMetrics role. There's a second service below this to actually run the metrics database: this snippet is just the Prometheus scraper. When the ansible.builtin.template module puts this on the device, it replaces {{ prometheus_port }} with a variable value I've defined in my playbook. Here are the Ansible tasks that do that:

- name: Copying VictoriaMetrics Docker Compose
  ansible.builtin.template:
    src: docker-compose.yml.j2
    dest: ~/victoriametrics/docker-compose.yml
  become: false

- name: Starting VictoriaMetrics
  community.docker.docker_compose:
    project_src: "~/victoriametrics/"
    build: false
    restarted: true
  become: false

I'm not sure if using ~/victoriametrics as the main directory for the container is a good idea, but it seems to be working for now.

Docker troubleshooting

docker ps lists all the containers that have been created and whether they're running. If a container is in a restarting state, there's likely something wrong with it that's preventing it from starting up.
docker logs <container> prints the log messages for just the processes running in that container.
docker exec -it <container> <command> runs a command in the context of the container. Most containers don't have many commands available (frustratingly, curl is usually replaced by wget), but this can help inspect the file system and can even run a shell if /bin/sh is passed as the command to run.

Summary

Hopefully this note helped you put together your own home server in a way that's easy to manage. I struggled for a long time with which configuration system to use, and eventually just did the simplest thing I could find that still let me check the steps into version control.‌‌ If you have any thoughts, don't hesitate to reach out and email me at hello@mattwidmann.net.