Docker Compose starts all services in a stack simultaneously by default. If your application depends on PostgreSQL or Redis being fully ready, you get connection errors, retry loops, and startup race conditions.
The fix is healthchecks. Not the Dockerfile-level HEALTHCHECK
instruction — the full Compose healthcheck block with
depends_on + condition: service_healthy. This post covers
everything from basic setup to real-world examples for common
homelab services, custom scripts, debugging, and the common
pitfalls.
How Docker Healthchecks Actually Work
A healthcheck is a command that runs inside the container
every interval seconds. Exit code 0 means healthy. Exit code 1
means unhealthy. If the check fails retries times in a row,
Docker marks the container unhealthy.
|
|
Each field has a specific purpose:
| Field | Default | What it controls |
|---|---|---|
test |
NONE |
Command to run (array form preferred) |
interval |
30s | How often the check runs |
timeout |
30s | Max time for one check before it’s a failure |
retries |
3 | Consecutive failures before marking unhealthy |
start_period |
0s | Grace period — no health checks run during this time |
The start_period is critical for databases and apps with long
initialization. During this period, Docker treats any exit code
as success so the service has time to boot before health tracking
begins.
depends_on with condition: service_healthy
Compose’s depends_on has three conditions:
|
|
service_started is the default and is practically useless for
ordering — it only means Docker started pulling the image and
creating the container, not that the process inside is listening.
For real dependency ordering, always use service_healthy with
a properly configured healthcheck on the dependency.
Full Example — Web App with PostgreSQL
|
|
The web app will not start until PostgreSQL is accepting connections. No retry loops, no connection refused errors.
Healthcheck Examples for Common Homelab Services
PostgreSQL
|
|
pg_isready ships with PostgreSQL and returns 0 when the server
is accepting connections. It hits the unix socket or TCP port
directly — no extra tools needed.
Redis
|
|
redis-cli ping returns PONG when the server is ready. Works
with or without a password (add -a $REDIS_PASSWORD if needed).
Nginx / Caddy
|
|
Requires curl in the image. For Alpine-based images, install
it in your Dockerfile:
|
|
MariaDB / MySQL
|
|
mysqladmin ping returns 0 when the server is alive. On MariaDB
official images, healthcheck.sh is provided:
|
|
RabbitMQ
|
|
RabbitMQ ships rabbitmq-diagnostics which does a full cluster
health check — not just port availability.
MinIO
|
|
Requires the MinIO client (mc) inside the container. The MinIO
official image includes it.
Vaultwarden (Bitwarden)
|
|
Custom Healthcheck Scripts
When a single command is not enough, mount a script or use the
CMD-SHELL form to chain commands.
Example — Multi-Condition Check for a Web App
|
|
Mounted Script Pattern
For complex checks that need branching logic:
|
|
Mount it into the container and reference it:
|
|
Dockerfile-Level vs Compose-Level Healthchecks
You can define healthchecks in two places, and they work differently.
Dockerfile HEALTHCHECK
|
|
Pros:
- Bundled with the image — every container gets it automatically
- Works without Compose (raw
docker run)
Cons:
- Can’t use
depends_onwithcondition: service_healthyon the Dockerfile-level check unless Compose is also involved - Less flexible per-instance
Compose-Level healthcheck
|
|
When to use which:
- Dockerfile HEALTHCHECK — base images that always need the same check (e.g., your custom application base image)
- Compose healthcheck — environment-specific overrides,
services you don’t control the image for, stacks where you
need
depends_on: condition: service_healthy
Compose healthcheck overrides the Dockerfile one when both are defined. This is useful when a database config difference changes the healthcheck endpoint.
The curl Problem — Healthchecks Without curl
Many minimal images (Alpine, distroless, scratch) do not ship
curl or wget. Installing them defeats the purpose of a small
image. Alternatives:
Use wget Instead
|
|
wget is smaller than curl and available on more base images.
Use the Application’s Built-in Check
PostgreSQL has pg_isready. Redis has redis-cli ping.
MySQL has mysqladmin ping. Prefer the service’s native tool
when it exists.
Use /dev/tcp (Bash Built-in)
For pure-shell environments without curl or wget:
|
|
This opens a TCP connection using bash’s /dev/tcp virtual
filesystem. Works on any image with bash (not Alpine’s busybox
sh by default — needs bash installed).
Use a Dedicated Health HTTP Server
For custom apps, embed a tiny health server:
|
|
Where /healthz is a compiled binary or script that checks
TCP ports, file descriptors, and returns 0 or 1. The Go standard
library’s net/http/pprof serves this pattern well.
Debugging Healthcheck Failures
Healthchecks fail silently unless you know where to look. Here is the diagnostic workflow.
1. Check Container Health State
|
|
Output includes the health status in the STATUS column: (healthy),
(unhealthy), or (starting) during start_period.
2. Inspect Health Log
|
|
This shows the full health history — every check, its exit code, and output:
|
|
The Output field is gold — it tells you exactly why the check
failed.
3. Run the Check Command Manually
|
|
This reproduces the exact same environment the healthcheck runs in. If it fails here, the check is correctly detecting a real problem. If it succeeds but the healthcheck still fails, check:
- Did
start_periodexpire? - Is the command in
$PATHinside the container? - Does the command use a shell feature not available in
sh?
4. Common Failures and Fixes
| Symptom | Likely cause | Fix |
|---|---|---|
curl: not found |
No curl in minimal image | Use native tool or install curl |
Connection refused |
Service not listening on localhost | Check bind 127.0.0.1 config |
pg_isready: command not found |
Wrong user or PATH | Use /usr/bin/pg_isready |
(starting) never becomes healthy |
start_period too short |
Increase start_period |
unhealthy after booting fine |
interval too fast for GC pauses |
Increase interval to 30-60s |
Advanced Pattern — Intra-Stack Dependency Chains
In complex stacks, you need to chain dependencies through multiple levels.
Layered Services Example
|
|
Compose resolves this dependency graph at startup. PostgreSQL and Redis start first. API waits for both. Nginx waits for API. Monitoring waits only for PostgreSQL.
Healthcheck Propagation in Compose Watch
Compose Watch (v2.27+) can wait for healthchecks before restarting dependent services during development:
|
|
When api source files change, Compose Watch syncs them and
waits for the API healthcheck to pass before signaling nginx
that the dependency is ready.
Avoiding the Retry Spiral — Graceful Degradation
Even with perfect healthchecks, services crash. The container
runtime will restart them per the restart policy, and the
healthcheck will detect when they recover. But transient
failures during the restart window can cascade.
Set restart: unless-stopped on all services and configure
your application code to retry connections with exponential
backoff:
|
|
The combination of Docker healthchecks on the infrastructure side + application-level retry logic on the app side gives you reliable recovery from any failure scenario.
Summary
Docker healthchecks are the difference between a stack that seems to work sometimes and one that boots reliably every time.
The four rules:
- Always use
depends_onwithcondition: service_healthyfor databases and services your app requires at startup - Set
start_periodto match your service’s boot time — underestimate this and the check fails before it should - Prefer native tools (
pg_isready,redis-cli ping) over curl for database healthchecks - Debug with
docker inspect— the health log output tells you exactly why the check failed
Copy the examples for PostgreSQL, Redis, Nginx, MariaDB, and others into your compose files and your applications will start in the right order, every time.