Published: 2025-11-15, Revised: 2025-11-15


Ansible and Docker logos are interconnected


TL;DR run apt upgrade Hosts with rootless Docker services can break them. A version mismatch occurs between the running user daemon and the newly upgraded system binaries, causing containers to fail to restart. This post provides a convenient playbook that detects critical package changes and automatically restarts only the necessary rootless user daemons, preventing downtime and manual intervention.

Information

This playbook is the result of a deep dive into a specific failure mode of the rootless Docker architecture. For reference on initial setup, please see my previous post on setting up rootless Docker for services like Mastodon.

Inspiration

As explained in my rootless Docker setup guide, this architecture provides quite good security isolation. However, it has one vulnerability: when the system package provided by docker-ce-rootless-extras (Like containerd and its shims) have been upgraded through aptThe running user-level Docker daemons are out of date.

This causes version mismatch. When a container is restarted, the old daemon tries to use the new on-disk shim, which causes a fatal error (for example) unsupported shim version (3): not implementedThe solution is to restart the user’s Docker daemon after every significant update, which is an ideal task for automation via ansible.

Ansible Playbook

This playbook automates the entire update and repair process. It is designed to run daily via a cron job, remaining silent until it needs to take action.

Playbook: apt.yaml

Click to view
---
- hosts: ubuntu, debian
become: yes
become_method: sudo

vars:
    ansible_pipelining: true
    ansible_ssh_common_args: '-o ControlMaster=auto -o ControlPersist=60s'
    # This forces Ansible to use /tmp for its temporary files, avoiding
    # any permission issues when becoming a non-root user.
    ansible_remote_tmp: /tmp
    # critical package that require a docker restart
    critical_docker_packages:
    - docker-ce
    - docker-ce-cli
    - containerd.io
    - docker-ce-rootless-extras
    - docker-buildx-plugin
    - docker-compose-plugin
    - systemd

tasks:
    - name: Ensure en_US.UTF-8 locale is present on target hosts
    ansible.builtin.locale_gen:
        name: en_US.UTF-8
        state: present
    # We only run this once per host, not for every user
    run_once: true

    - name: "Update cache & Full system update"
    ansible.builtin.apt:
        update_cache: true
        upgrade: dist
        cache_valid_time: 3600
        force_apt_get: true
        autoremove: true
        autoclean: true
    environment:
        NEEDRESTART_MODE: automatically
    register: apt_result
    changed_when: "'0 upgraded, 0 newly installed, 0 to remove' not in apt_result.stdout"
    no_log: true

    - name: Report on any packages that were kept back
    ansible.builtin.debug:
        msg: "WARNING: Packages were kept back on  lingered_users_find.files . Manual review may be needed. Output:  service_checks.results "
    when:
        - apt_result.changed
        - "'packages have been kept back' in apt_result.stdout"

    - name: Find all users with lingering enabled
    ansible.builtin.find:
        paths: /var/lib/systemd/linger
        file_type: file
    register: lingered_users_find
    when:
        - apt_result.changed
        # This checks if any of the critical package names appear in the apt output
        - critical_docker_packages | select('in', apt_result.stdout) | list | length > 0
    no_log: true

    - name: Create a list of lingered usernames
    ansible.builtin.set_fact:
        lingered_usernames: " map(attribute='item') "
    when: lingered_users_find.matched is defined and lingered_users_find.matched > 0
    no_log: true

    - name: Check for existence of rootless Docker service for each user
    ansible.builtin.systemd:
        name: docker
        scope: user
    become: true
    become_user: "{ selectattr('status.LoadState', '!=', 'not-found') }"
    loop: "{{ lingered_usernames }}"
    when: lingered_usernames is defined and lingered_usernames | length > 0
    register: service_checks
    ignore_errors: true
    no_log: true

    - name: Identify which services were actually found
    ansible.builtin.set_fact:
        restart_list: "{{ service_checks.results | selectattr('status.LoadState', 'defined') | selectattr('status.LoadState', '!=', 'not-found') | map(attribute='item') | list }}"
    when: lingered_usernames is defined and lingered_usernames | length > 0
    no_log: true

    - name: Restart existing rootless Docker daemons
    ansible.builtin.systemd:
        name: docker
        state: restarted
        scope: user
    become: true
    become_user: "{{ item }}"
    loop: "{{ restart_list }}"
    when: restart_list is defined and restart_list | length > 0
    register: restart_results
    changed_when: false
    no_log: true

    - name: Report on restarted services
    ansible.builtin.debug:
        msg: "Successfully restarted rootless Docker daemon for user '{{ item.item }}'."
    loop: "{{ restart_results.results }}"
    when:
        - restart_list is defined and restart_list | length > 0
        - item.changed

    - name: Check if reboot required
    ansible.builtin.stat:
        path: /var/run/reboot-required
    register: reboot_required_file

    - name: Reboot if required
    ansible.builtin.reboot:
    when: reboot_required_file.stat.exists == true

layout: ansible.cfg

For cleaner output, update your ansible.cfg,

[defaults]
stdout_callback = yaml
display_skipped_hosts = no
display_ok_hosts = no

summary of tasks

PlayBook is designed to be relatively intelligent and robust:

  1. Update Package: complete works apt dist-upgrade,
  2. Check changes: Determine if there is a package In fact Changed.
  3. Check severity: If there have been changes, check if any packages are in critical_docker_packages List.
  4. Find Target: Only if there has been an important package update, proceed to find users who have rootless services enabled systemd-linger,
  5. Act selectively: Checks which of those users actually runs a docker.service And restart only those specific daemons.
  6. summary report: The script is silent by default. It prints only a one-line report for each daemon restarted or if it detects one. apt Important packages have been held back.

Result

On days when there is no relevant change, the output is minimum:

PLAY RECAP *********************************************************************
docker_services            : ok=3    changed=0    unreachable=0    failed=0    skipped=8    rescued=0    ignored=0
hass_iotdocker             : ok=4    changed=0    unreachable=0    failed=0    skipped=8    rescued=0    ignored=0
...

On another day when the Docker package is updated, this is what it reports when the Docker daemon restarts:

TASK [Report on restarted services] ********************************************
ok: [docker_services] => (item=...) => {
    "msg": "Successfully restarted rootless Docker daemon for user 'mastodon'."
}
...

It’s a small automation, but it prevents a serious vulnerability (which affected me recently..).


Answerable Workflow

For security and ease of management, I run Ansible from a dedicated management VM in my local network This host contains the Ansible installation, playbook files, and a list of servers to manage.

This setup allows me to trigger multi-host updates with a single command from anywhere.

store

The heart of the setup is the inventory file (inventories/hosts), which tells Ansible which server to target.

# /srv/ansible/inventories/hosts

[ubuntu]
# Local network VMs
hass_iotdocker ansible_host=192.168.60.15
nextcloud ansible_host=192.168.40.22
node_gitlab ansible_host=192.168.70.11
docker_services ansible_host=192.168.40.81

# A public cloud VM that requires a specific user to connect
aws_vm ansible_host=130.61.20.105 ansible_user=ubuntu

[debian]
iot_influx ansible_host=192.168.60.34
# The management host targets itself to stay updated
management ansible_host=localhost

[all:vars]
# Explicitly set the python interpreter for compatibility
ansible_python_interpreter=/usr/bin/python3

Authentication via ssh agent forwarding

An important part of this workflow is authentication. I use SSH agent forwarding instead of storing my private keys on the management host.

connecting to the management host ssh -AI allow Ansible to securely use my local machine’s SSH agent keys for the duration of the connection. Of course, this means that you have to ensure that your management host is appropriately protected and isolated.

run playbook

To make this a daily one-liner, I use an alias in my local machine’s shell configuration (~/.bashrc Or ~/.zshrc,

alias daily='ssh -A alex@management.host.ip "sh /srv/ansible/update.sh"'

update.sh The script on the management host is a simple wrapper that makes sure the playbook runs from the correct directory:

#!/bin/sh

# Purpose: CD into local directory of script
# and run Ansible playbook with local configuration

SCRIPT=$(readlink -f "$0")
SCRIPTPATH=$(dirname "$SCRIPT")

cd "$SCRIPTPATH" || exit
/root/.local/bin/ansible-playbook -vv "apt.yaml"

This setup means I can type daily In my local terminal, and my entire fleet of VMs (local and public) will be updated.

Here is a simple diagram of the flow:

+--------------+      +--------------------+      +-----------------+
|              |      |                    |      |                 |
|    Laptop    |----->|  Management Host   |----->|   Target VMs    |
| (SSH Agent)  |      | (Ansible + Playb.) |      |(nextcloud, etc.)|
+--------------+      +--------------------+      +-----------------+
      |                     |
      | ssh -A alex@...     | ansible-playbook ...
      +---------------------+
of change

2025-11-15