Docker Compose Production Patterns for Homelabs — Healthchecks, Profiles, Secrets

Most docker-compose.yml files you find in tutorials are minimal. They work for docker compose up -d on a fresh system, but they don’t handle service startup order, container crashes, secrets, or configuration reuse — all things that matter when your stack runs for months.

This post covers the patterns that turn a throwaway compose file into something you can deploy, forget, and trust. Every pattern includes a real example you can drop into your existing stack.

Note: Some examples are partial Compose snippets meant to demonstrate one pattern at a time. When copying them into a real compose.yml, make sure referenced services, secrets, volumes, and networks are also defined.

1. Healthchecks — Know When a Container Is Actually Ready

The default Docker healthcheck is: “is the process running?” That’s not the same as “is the service accepting traffic?”

A PostgreSQL container can start with the process running while it’s still recovering the WAL. A Traefik container can bind port 80 while loading its configuration. Healthchecks solve this.

Database Healthcheck

1
2
3
4
5
6
7
8
9


services:
  postgres:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

start_period: gives the container 30s grace before healthchecks count. PostgreSQL needs this on first start — it initializes the data directory before accepting connections.
interval: check every 10 seconds.
retries: mark unhealthy after 5 consecutive failures.

Web Service Healthcheck

1
2
3
4
5
6
7
8
9


services:
  nginx:
    image: nginx:alpine
    healthcheck:
      test: ["CMD", "nginx", "-t"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s

For HTTP services, use curl:

1
2
3
4
5
6
7
8
9


services:
  app:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 15s
      timeout: 3s
      retries: 3
      start_period: 15s

The container must have curl installed. Use a curl-based image or install it in your Dockerfile:

1
2


FROM alpine:3.20
RUN apk add --no-cache curl

View Health Status

1
2
3
4
5
6
7
8


# Show health status for all containers
docker compose ps

# Filter to unhealthy containers only
docker compose ps --status unhealthy

# See healthcheck log output
docker inspect --format='{{json .State.Health}}' <container_name>

A container marked (unhealthy) won’t receive traffic from other services using depends_on: condition: service_healthy.

2. Depends On with Conditions — Real Startup Ordering

The old depends_on only waited for a container to start, not for it to be ready. Compose v2.20+ supports conditions, and when combined with healthchecks, you get real ordering.

Service-Healthy Dependencies

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47


secrets:
  db_app_password:
    file: ./secrets/db_app_password.txt

services:
  postgres:
    image: postgres:16-alpine
    secrets:
      - db_app_password
    environment:
      POSTGRES_USER: appuser
      POSTGRES_DB: appdb
      POSTGRES_PASSWORD_FILE: /run/secrets/db_app_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
      interval: 5s
      timeout: 3s
      retries: 5
      start_period: 20s

  api:
    image: myapi:latest
    depends_on:
      postgres:
        condition: service_healthy
    secrets:
      - db_app_password
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 10s
      timeout: 3s
      retries: 3
      start_period: 10s
    environment:
      DATABASE_HOST: postgres
      DATABASE_PORT: "5432"
      DATABASE_NAME: appdb
      DATABASE_USER: appuser
      DATABASE_PASSWORD_FILE: /run/secrets/db_app_password

  nginx:
    image: nginx:alpine
    depends_on:
      api:
        condition: service_healthy
    ports:
      - "80:80"

This creates a chain: nginx waits for api, api waits for postgres. Each service starts exactly when its dependency reports healthy.

Service-Started (Default) vs Service-Healthy

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


services:
  # Only needs to start after redis starts the process (default)
  cache-worker:
    image: myworker:latest
    depends_on:
      - redis

  # Must wait until postgres accepts connections
  app:
    image: myapp:latest
    depends_on:
      postgres:
        condition: service_healthy

Use service_started (the default) for fast dependencies like Redis or Memcached. Use service_healthy for databases, message queues, and anything with startup initialization.

3. Restart Policies — Survive Crashes Gracefully

Don’t leave containers set to restart: unless-stopped without understanding the failure modes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


services:
  # Core services that must stay up
  traefik:
    image: traefik:v3.3
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"

  # One-shot or batch jobs — never restart
  db-migrate:
    image: myapp:latest
    restart: no
    command: ["npm", "run", "migrate"]

  # Services that can crash briefly but should retry fast
  monitoring-agent:
    image: grafana/agent:latest
    restart: on-failure:5
    # Max 5 retries, then give up

Restart Strategy Decision Table

Restart Policy	Use Case	Behavior
`no`	Batch jobs, migrations, cron containers	Never restarts
`always`	Critical infrastructure, reverse proxies	Restarts regardless of exit code
`unless-stopped`	Long-running services you might manually stop	Restarts unless manually stopped
`on-failure:N`	Services with occasional transient failures	Restarts only on non-zero exit, up to N times

Resource Limits Prevent Restart Loops

A container that OOM-kills and restarts in a loop will keep getting killed unless you constrain memory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


services:
  buggy-app:
    image: myapp:latest
    restart: on-failure:3
    deploy:
      resources:
        limits:
          memory: 256M
        reservations:
          memory: 128M

With memory limits, the container is killed for exceeding 256 MB rather than exhausting host memory and triggering the OOM killer. The on-failure:3 cap prevents infinite restart loops if the app keeps crashing.

4. Profiles — Optional Services Without Separate Compose Files

Profiles let you define services that only start when explicitly requested. Perfect for debugging tools, admin panels, or services you only need occasionally.

Define Profiles

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


services:
  # Main stack — always starts
  postgres:
    image: postgres:16-alpine
    # no profiles = always included

  api:
    image: myapi:latest
    depends_on:
      - postgres

  # Debug console — only with --profile debug
  pgadmin:
    image: dpage/pgadmin4:latest
    profiles:
      - debug
    depends_on:
      - postgres
    environment:
      PGADMIN_DEFAULT_EMAIL: [email protected]
      PGADMIN_DEFAULT_PASSWORD_FILE: /run/secrets/pgadmin_pass
    secrets:
      - pgadmin_pass

  # Performance testing tools
  k6:
    image: grafana/k6:latest
    profiles:
      - loadtest
      - debug
    # k6 belongs to both profiles

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


# Start normal stack
docker compose up -d

# Start with debug tools
docker compose --profile debug up -d

# Start with load testing tools
docker compose --profile loadtest up -d

# Start everything
docker compose --profile "*" up -d

Real Homelab Use Case: Monitoring vs Debug

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31


services:
  # Always running
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus:/etc/prometheus

  grafana:
    image: grafana/grafana:latest
    depends_on:
      - prometheus

  # On-demand query explorer
  postgres-exporter:
    image: prometheuscommunity/postgres-exporter:latest
    profiles:
      - diagnostics
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DATA_SOURCE_NAME: "postgresql://exporter:pass@postgres:5432/appdb?sslmode=disable"

  # On-demand slow query log viewer
  pghero:
    image: ankane/pghero:latest
    profiles:
      - diagnostics
      - debug
    depends_on:
      - postgres

Start your stack normally. When you need to troubleshoot database performance, run docker compose --profile diagnostics up -d and pgHero + postgres-exporter appear in your monitoring stack. Shut them down with docker compose --profile diagnostics down when done.

5. Secrets Management — Stop Putting Passwords in Compose Files

Docker Compose has built-in secret support since v2.x. Secrets are files mounted into /run/secrets/ inside containers, readable only by the UID the container runs as.

Define Secrets

1
2
3
4
5
6
7
8
9


secrets:
  db_app_password:
    file: ./secrets/db_app_password.txt
  api_key:
    file: ./secrets/api_key.txt
  tls_cert:
    file: ./certs/homelab.crt
  tls_key:
    file: ./certs/homelab.key

Use Secrets in Services

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


services:
  postgres:
    image: postgres:16-alpine
    secrets:
      - db_app_password
    environment:
      POSTGRES_USER: appuser
      POSTGRES_DB: appdb
      POSTGRES_PASSWORD_FILE: /run/secrets/db_app_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
      # Official postgres image creates appuser/appdb on first init
      # and reads POSTGRES_PASSWORD_FILE for appuser's password.

  api:
    image: myapi:latest
    secrets:
      - db_app_password
      - api_key
    environment:
      DATABASE_HOST: postgres
      DATABASE_PORT: "5432"
      DATABASE_NAME: appdb
      DATABASE_USER: appuser
      DATABASE_PASSWORD_FILE: /run/secrets/db_app_password
      # This requires app support for *_PASSWORD_FILE.
      # If the app only accepts DATABASE_URL, use an entrypoint wrapper
      # that reads /run/secrets/db_app_password and exports DATABASE_URL.
      API_KEY_FILE: /run/secrets/api_key

Service-Specific Secret Access

Only mount secrets to services that need them:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


secrets:
  db_app_password:
    file: ./secrets/db_app_password.txt
  pgadmin_pass:
    file: ./secrets/pgadmin_pass.txt
  grafana_admin_pass:
    file: ./secrets/grafana_admin.txt

services:
  postgres:
    secrets:
      - db_app_password

  pgadmin:
    profiles:
      - debug
    secrets:
      - pgadmin_pass
    # pgadmin cannot read db_app_password or grafana_admin_pass

  grafana:
    secrets:
      - grafana_admin_pass
    # grafana cannot read db_app_password or pgadmin_pass

This is least-privilege in practice. A compromised pgadmin container cannot read your database password or Grafana admin credentials.

Directory Layout for Secrets

docker/
├── compose.yml
├── secrets/
│   ├── db_app_password.txt
│   ├── api_key.txt
│   ├── pgadmin_pass.txt
│   └── grafana_admin.txt
├── certs/
│   ├── homelab.crt
│   └── homelab.key
└── config/
    ├── traefik.yml
    └── prometheus.yml

Gitignore secrets but keep the structure:

# .gitignore
secrets/*.txt
certs/*.key

Commit .gitkeep files to preserve the directory structure:

1

find secrets certs -type d -exec touch {}/.gitkeep \;

6. Extension Fields — DRY Your Compose Files

Extension fields use the x- prefix and let you define reusable blocks that YAML anchors can reference.

Common Labels

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27


x-common-labels: &common-labels
  labels:
    - "monitoring=prometheus"
    - "backup=daily"
    - "owner=homelab"

x-logging: &logging
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
      max-file: "3"

x-base: &base
  <<: [*common-labels, *logging]

services:
  postgres:
    image: postgres:16-alpine
    <<: *base

  api:
    image: myapi:latest
    <<: *base
    depends_on:
      postgres:
        condition: service_healthy

Reusable Healthcheck Definitions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24


x-db-healthcheck: &db-healthcheck
  healthcheck:
    test: ["CMD-SHELL", "pg_isready -U postgres"]
    interval: 10s
    timeout: 5s
    retries: 5
    start_period: 30s

x-http-healthcheck: &http-healthcheck
  healthcheck:
    test: ["CMD", "curl", "-f", "http://localhost/health"]
    interval: 15s
    timeout: 3s
    retries: 3
    start_period: 10s

services:
  postgres:
    image: postgres:16-alpine
    <<: *db-healthcheck

  api:
    image: myapi:latest
    <<: *http-healthcheck

Network and Volume Definitions

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


x-networks: &networks
  networks:
    internal:
    proxy:

services:
  app:
    image: myapp:latest
    <<: *networks
    # app is on both internal and proxy

  postgres:
    image: postgres:16-alpine
    networks:
      - internal
      # postgres is internal-only — not exposed to proxy

7. Multiple Compose Files — Split by Concern

Instead of one monster compose file, split by domain and use -f:

1
2
3
4
5


docker compose \
  -f compose.base.yml \
  -f compose.monitoring.yml \
  -f compose.media.yml \
  up -d

base.yml — Shared Infrastructure

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


# compose.base.yml
services:
  traefik:
    image: traefik:v3.3
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      - ./config/traefik:/etc/traefik
      - ./certs:/certs

networks:
  proxy:
    external: true
    name: proxy

monitoring.yml — Monitoring Stack

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36


# compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    restart: unless-stopped
    volumes:
      - ./config/prometheus:/etc/prometheus
      - prometheus_data:/prometheus
    networks:
      - proxy
      - monitoring

  grafana:
    image: grafana/grafana:latest
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    secrets:
      - grafana_admin_pass
    networks:
      - proxy

  node-exporter:
    image: prom/node-exporter:latest
    restart: unless-stopped
    networks:
      - monitoring
    pid: "host"

volumes:
  prometheus_data:
  grafana_data:

networks:
  monitoring:
    internal: true

The -f approach means you can update monitoring (docker compose -f compose.monitoring.yml pull && docker compose -f compose.monitoring.yml up -d) without touching the media stack.

8. Restart Limits and Backoff

Docker’s default restart behavior has an exponential backoff: 100ms, 200ms, 400ms, doubling up to a max of 5 minutes between attempts. This is sensible, but you should understand it.

Monitor restart count:

1
2
3
4
5


# Check how many times a container has restarted
docker inspect --format '{{.RestartCount}}' <container_name>

# Watch restart events in real time
docker events --filter 'type=container' --filter 'event=restart'

Prevent Crash Loops with Deregistration

A service that keeps crashing and restarting is still registered in service discovery (Traefik, Consul, etc.). If your reverse proxy sees the container, it will route traffic to it during its brief healthy window.

1
2
3
4
5
6
7
8
9


services:
  app:
    image: myapp:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 5s
      retries: 2
      start_period: 5s
    restart: on-failure:3

After 3 crash-restart cycles, Docker stops trying. The healthcheck ensures Traefik deregisters the container before the healthcheck fails permanently.

9. Complete Production Compose File

Here’s a real compose.yml that combines all the patterns:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91


# compose.yml — Production homelab stack
# Combined base config for restart policy + logging
x-base: &base
  restart: unless-stopped
  logging:
    driver: "json-file"
    options:
      max-size: "10m"
      max-file: "3"

secrets:
  db_app_password:
    file: ./secrets/db_app_password.txt
  api_key:
    file: ./secrets/api_key.txt

services:
  postgres:
    image: postgres:16-alpine
    <<: *base
    volumes:
      - pgdata:/var/lib/postgresql/data
    secrets:
      - db_app_password
    environment:
      POSTGRES_USER: appuser
      POSTGRES_DB: appdb
      POSTGRES_PASSWORD_FILE: /run/secrets/db_app_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U appuser -d appdb"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    networks:
      - internal

  app:
    image: myapp:latest
    <<: *base
    depends_on:
      postgres:
        condition: service_healthy
    secrets:
      - db_app_password
      - api_key
    environment:
      DATABASE_HOST: postgres
      DATABASE_PORT: "5432"
      DATABASE_NAME: appdb
      DATABASE_USER: appuser
      DATABASE_PASSWORD_FILE: /run/secrets/db_app_password
      API_KEY_FILE: /run/secrets/api_key
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 15s
      timeout: 3s
      retries: 3
      start_period: 15s
    deploy:
      resources:
        limits:
          memory: 256M
    networks:
      - internal
      - proxy

  traefik:
    image: traefik:v3.3
    <<: *base
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./config/traefik:/etc/traefik
    networks:
      - proxy
    depends_on:
      app:
        condition: service_healthy

volumes:
  pgdata:

networks:
  internal:
    internal: true
  proxy:
    external: true
    name: proxy

Summary

Pattern	Key Takeaway
Healthchecks	Poll a real readiness endpoint, not “process is running”
depends_on condition	Use `condition: service_healthy` for databases and API servers
Restart policies	`unless-stopped` for long-lived, `on-failure:N` with a cap for crashable services
Profiles	Keep debug/admin tools in `--profile debug`, not in a separate compose file
Secrets	Use `secrets: file:` instead of environment variables for passwords and keys
Extension fields	`x-*` blocks with YAML anchors keep repetitive configs DRY
Multi-file compose	Split by domain (infra, monitoring, media) with `-f` flag
Resource limits	Always set `deploy.resources.limits.memory` to prevent OOM crash loops

The difference between a homelab stack that runs for 3 days and one that runs for 3 years is mostly these patterns — startup ordering, failure handling, and secret hygiene. Apply them one at a time, and your compose files will survive reboots, crashes, and the inevitable “what happens when this container runs out of memory?” question.

1. Healthchecks — Know When a Container Is Actually Ready#

Database Healthcheck#

Web Service Healthcheck#

View Health Status#

2. Depends On with Conditions — Real Startup Ordering#

Service-Healthy Dependencies#

Service-Started (Default) vs Service-Healthy#

3. Restart Policies — Survive Crashes Gracefully#

Restart Strategy Decision Table#

Resource Limits Prevent Restart Loops#

4. Profiles — Optional Services Without Separate Compose Files#

Define Profiles#

Usage#

Real Homelab Use Case: Monitoring vs Debug#

5. Secrets Management — Stop Putting Passwords in Compose Files#

Define Secrets#

Use Secrets in Services#

Service-Specific Secret Access#

Directory Layout for Secrets#

6. Extension Fields — DRY Your Compose Files#

Common Labels#

Reusable Healthcheck Definitions#

Network and Volume Definitions#

7. Multiple Compose Files — Split by Concern#

base.yml — Shared Infrastructure#

monitoring.yml — Monitoring Stack#

8. Restart Limits and Backoff#

Prevent Crash Loops with Deregistration#

9. Complete Production Compose File#

Summary#