Docker’s HEALTHCHECK instruction tells you when a container is
broken, but it does nothing about it. A PostgreSQL container
that loses its WAL directory, a web app stuck in a deadlock, or
a Redis instance that silently drops connections — all stay
running in an unhealthy state, serving errors until you notice
and restart them manually.
This is the single biggest reliability gap in standalone Docker deployments. Kubernetes and Docker Swarm handle this natively, but if you run plain Docker Compose stacks in your homelab, the responsibility falls on you.
This guide covers every practical approach to container auto-healing: from the simplest one-line solution to custom watchdog scripts for edge cases. You will leave with a self-healing setup that requires zero manual intervention for common failure modes.
Docker HEALTHCHECK: The Foundation
Auto-healing starts with good health checks. Without them, no tool can distinguish a working container from a broken one.
HEALTHCHECK in a Dockerfile
Embed health checks directly in images you build:
|
|
For databases, use protocol-native checks:
|
|
HEALTHCHECK at Runtime
For third-party images you cannot modify, pass health check parameters when running the container:
|
|
HEALTHCHECK in Docker Compose
The same pattern in Compose format:
|
|
Checking Health Status
|
|
Once these health checks are in place, you can detect failures. The next step is acting on them.
docker-autoheal: The Simple Solution
willfarrell/docker-autoheal is a standalone container that watches Docker events for health status changes and restarts unhealthy containers automatically. It is the closest thing to native auto-healing for standalone Docker.
Deploy with Compose
|
|
This restarts every container that becomes unhealthy. For finer control, label specific containers:
|
|
Then set AUTOHEAL_CONTAINER_LABEL=autoheal instead of all.
Configuration Options
| Variable | Default | Description |
|---|---|---|
AUTOHEAL_CONTAINER_LABEL |
autoheal |
Label to filter containers, or all |
AUTOHEAL_INTERVAL |
5 |
Seconds between health check polls |
AUTOHEAL_START_PERIOD |
0 |
Seconds to wait before monitoring |
AUTOHEAL_DOCKER_SOCK |
unset | Path to Docker socket inside container |
Pros and Cons
Pros:
- One Docker Compose service, zero config
- Works with any container that has HEALTHCHECK
- Respects restart policies after restart
- Active project, updated regularly
Cons:
- Requires mounting the Docker socket (security consideration)
- Race condition if the container restarts itself during heal
- No notification when auto-heal fires
Custom Watchdog Script with Docker API
If you want more control or no-socket alternatives, write a watchdog script using the Docker SDK. This approach also lets you add notifications, rate limiting, and selective restart logic.
Python Watchdog with Slack Alerts
|
|
Run it as a systemd service:
|
|
Bash Watchdog (Minimal)
For a lighter alternative without Python dependencies:
|
|
systemd Docker Service Watchdog
For critical infrastructure containers (DNS, reverse proxy, VPN), bypass Docker’s health check entirely and use systemd to manage the container as a service.
Create a systemd Service for a Container
|
|
The key advantages:
- systemd restarts: if the Docker daemon fails or restarts, systemd respawns the container automatically.
- Rate limiting:
StartLimitIntervalSecandStartLimitBurstprevent restart loops. - Logging: container logs go to the systemd journal.
- Health depends on process: if the container process exits, systemd restarts it regardless of Docker restart policies.
systemd Health Check Extension
You can combine systemd with a timer unit that checks the container health and restarts the service if unhealthy:
|
|
Service file:
|
|
Uptime Kuma Webhook Restart Pattern
Uptime Kuma monitors HTTP endpoints and can trigger webhooks on failure. Pair it with a lightweight webhook receiver that restarts the failing container.
Deploy the Webhook Receiver
Create a simple webhook handler with a shell script:
|
|
Or use a dedicated webhook service like webhook with a minimal config:
|
|
In Uptime Kuma, configure the monitor’s Notification to send
a POST webhook to http://your-watchdog:8080/hooks/restart-container
with the container name in the payload.
Self-Healing Docker Compose Stack: Complete Example
Here is a real-world auto-healing stack combining all the techniques above:
|
|
Restart Policy Deep Dive
Docker provides four restart policies. Understanding their interaction with auto-healing is critical:
| Policy | Behavior | Auto-Heal Required |
|---|---|---|
no |
Never restart | Yes |
on-failure[:max-retries] |
Restart on non-zero exit | Yes (unhealthy ≠ exit) |
unless-stopped |
Restart unless manually stopped | Yes |
always |
Always restart | Yes |
Key insight: none of these trigger on health status. A container exiting with code 0 then immediately entering an idle loop will not restart — unless you add a HEALTHCHECK that fails and an auto-heal mechanism that acts on it.
For containers that crash frequently, combine restart policies with auto-heal:
|
|
This handles both crash loops (Docker restart on exit) and stuck processes (autoheal restart on unhealthy).
Handling System Restarts and Daemon Failures
Auto-healing containers do nothing if the Docker daemon itself crashes or the host reboots. For a fully self-healing setup:
1. Enable Docker Restart on Boot
|
|
2. Use restart: unless-stopped on All Services
This ensures all containers restart after a daemon restart.
3. Use systemd for Critical Infrastructure
As described above, systemd services survive Docker daemon failures better than pure Docker restart policies:
|
|
4. Monitor the Auto-Heal Daemon Itself
The auto-heal container has no one to heal it. Use a simple cron job or systemd timer:
|
|
Or run the watchdog script as a systemd service so systemd restarts it if it crashes.
Notification and Observability
When auto-heal fires, you want to know about it. Wire up notifications to catch patterns before they become chronic:
docker-autoheal with Webhook
docker-autoheal can call a webhook after each restart. Set the
AUTOHEAL_WEBHOOK_URL environment variable:
|
|
Log Aggregation
Forward Docker events to your logging stack:
|
|
Prometheus Metrics
Expose container health as Prometheus metrics using prometheus-health-exporter or a custom exporter:
|
|
When Not to Auto-Heal
Auto-healing is not always the right answer. Consider the following before enabling it on every container:
- Stateful databases: restarting a corrupted database container may make things worse. Prefer manual investigation.
- One-shot jobs: containers that run to completion should not be auto-healed.
- Rate-limited services: if a container is unhealthy due to API rate limiting, restarting resets the timer but does not solve the root cause.
- Config errors: a container that fails at startup due to a bad config file will fail again after restart. Fix the config.
Use labels selectively. Apply autoheal=true only to containers
where automatic recovery is safe and desirable.
Summary
Docker provides excellent health detection but no built-in recovery. For a truly self-healing homelab:
- Add HEALTHCHECK to every service — in Dockerfiles, Compose files, or runtime flags.
- Deploy docker-autoheal for zero-config auto-restart on healthy labeled containers.
- Use systemd services for critical infrastructure that must survive Docker daemon failures.
- Add notifications — Slack, webhook, or Prometheus — so you know when auto-heal fires.
- Label selectively — not every container should auto-heal.
The five minutes it takes to add health checks and auto-heal to your Compose file pays for itself the first time a container fails at 3 AM.