The OOM Problem on Docker Homelab Hosts
Your homelab host runs Docker containers. Maybe a dozen. Maybe thirty. Postgres for Immich, Redis for Valkey cache, Prometheus hammering metrics every 15 seconds, Ollama loading a 7B model into RAM. The host has 16 GB total. Maybe 8 GB if you’re running on a repurposed thin client. And you’ve just rebuilt your media library — now Immich is transcoding, Postgres is indexing, and Prometheus is scraping.
Then everything freezes. SSH stops responding. The web UI disappears. Thirty seconds later, things come back — but Postgres is gone. Or worse, Prometheus, and now you lost metrics. The kernel’s OOM killer picked a victim. It almost never picks the right one.
The Linux Out-Of-Memory (OOM) killer doesn’t understand Docker container boundaries or service priority. It sees memory pressure, calculates an oom_score for every process, and kills the highest scorer. A 1 GiB Postgres container that’s been running for months has a higher badness score than a memory-leaking sidecar that started five minutes ago. The kernel doesn’t know which container is critical to your homelab — it just sees big processes using memory.
This guide covers every layer of OOM defense for a Docker homelab host, from Docker Compose resource limits through kernel sysctl tuning. Apply all five layers and your critical services stay up under memory pressure.
Docker Compose Resource Limits Are Your First Line of Defense
The single most important thing you can do for Docker host stability is set resource limits on every container. Without limits, a single container can consume all host memory and force the kernel to make kill decisions.
Docker Compose v3 supports deploy-level resource constraints:
|
|
With docker compose up (not swarm), these limits are applied directly via cgroups v2. When a container reaches its memory limit, Docker’s internal OOM kills the container itself — contained, predictable, and logged. Without limits, the kernel kills host-wide.
For docker run, the equivalent flags are:
|
|
Key detail: By default Docker sets --memory-swap to twice --memory, giving each container swap access equal to its RAM limit. To disable swap per-container, set --memory-swap equal to --memory:
|
|
Every container in your homelab should have at least a memory limit. No exceptions. Even monitoring tools need limits — a Prometheus tsdb compaction spike can eat 2 GB before you notice.
systemd-oomd — Userspace OOM Prevention at the Cgroup Level
Systemd v250+ ships systemd-oomd, a userspace OOM manager that monitors memory pressure at the cgroup level and kills offending cgroups before the kernel panic-kills random processes. It’s the cleanest approach for Docker hosts because Docker containers already live under systemd cgroup hierarchies.
Enable it on Ubuntu 22.04+ and Debian Bookworm+:
|
|
Configure thresholds in /etc/systemd/oomd.conf:
|
|
This tells oomd to kill the cgroup with the highest memory pressure when:
- 60%+ memory pressure sustained for 30 seconds, or
- 90%+ swap is used
To make Docker’s cgroup eligible for systemd-oomd management, add ManagedOOM=kill to the Docker daemon’s service override:
|
|
Add:
|
|
Then reload:
|
|
Check oomd activity:
|
|
You should see entries like:
systemd-oomd[XXX]: Performing kill action for cgroup /system.slice/docker-<id>.scope (memory pressure critical)
The key advantage over the kernel OOM killer: systemd-oomd kills the entire cgroup (all processes in the Docker container) rather than picking one process and leaving orphaned children. For Docker hosts, this is exactly what you want.
Earlyoom — Simple OOM Prevention Daemon
Earlyoom is a lightweight userspace OOM killer that monitors available memory and swap, and preemptively kills the process with the largest rss allocation before the kernel OOM killer activates. It’s simpler than systemd-oomd and works on any systemd-capable distribution.
Installation:
|
|
Earlyoom’s default configuration is conservative: it acts when free memory drops below 10% and swap is below 10%. The recommended config for Docker hosts is more aggressive:
Edit /etc/default/earlyoom:
EARLYOOM_ARGS="-m 5,10 -s 10,5 -r 3600 --prefer '(!postgres|!redis|!prometheus|!nginx)' --avoid '(!systemd|!sshd)'"
Breaking this down:
-m 5,10— kill when available memory drops below 5%, report to syslog at 10%-s 10,5— kill when available swap drops below 10%, report at 5%-r 3600— report memory stats to syslog every hour--prefer— regex for processes to prefer not killing (inverted syntax: kill processes that don’t match these first)--avoid— processes to never kill (systemd, sshd — the things you need to recover)
Enable the daemon:
|
|
Test it with a memory stress:
|
|
Watch earlyoom in action:
|
|
Expected output:
earlyoom[XXX]: mem avail: 387 of 7855 MiB ( 4.93%), swap free: 456 of 2048 MiB (22.27%)
earlyoom[XXX]: sending SIGTERM to process 12345 (stress-ng-vm):
Earlyoom sends SIGTERM first (graceful shutdown), waits 10 seconds, then SIGKILL if the process is still alive. This is far more civilized than the kernel OOM killer’s instant SIGKILL.
Setting oom_score_adj for Critical Services
The kernel calculates an oom_score for every process based on rss size + (total_vm / 2) × (cpu_time / total_cpu_time). You can adjust this with oom_score_adj:
| Value | Effect |
|---|---|
| -1000 | OOM_DISABLE — process is invisible to OOM killer |
| -500 | Very unlikely to be killed |
| -200 | Less likely |
| 0 | Default |
| +500 | More likely |
| +1000 | Highest priority target |
Docker passes oom-score-adj per container:
|
|
Docker Compose doesn’t have native oom_score_adj support in v3, but you can work around it with wrapper scripts or by using docker run directly for critical services. Alternatively, set it post-start with a systemd timer:
|
|
Make it executable and run on boot via systemd or cron:
|
|
Recommended oom_score_adj values for a typical homelab:
- Databases (Postgres, MariaDB, Redis): -500 to -800
- Web/Reverse proxies (Nginx, Traefik, Caddy): -200 to -400
- Monitoring (Prometheus, Grafana, Loki): 0 (neutral)
- Batch/Transient (build jobs, backup scripts): +250 to +500
- System critical (sshd, systemd-journald, Docker daemon): Already protected by systemd service configuration
Swap Configuration: To Have or Not to Have
The Docker documentation recommends disabling swap entirely for production Docker hosts. The reasoning: when a host swaps, performance tanks for all containers, and the OOM killer may trigger too late to be useful.
For homelab hosts, a middle ground works better:
1. Low swappiness — Set swappiness to 1 or 10 so swap is used only as emergency buffer:
|
|
2. Disable swap per-container — Prevent individual containers from swapping by setting --memory-swap equal to --memory as shown earlier. This keeps the host-level swap available for non-container processes.
3. ZRAM as compressed swap — For hosts with limited RAM (8-16 GB), compressed swap in RAM is a net win:
|
|
Persist via /etc/systemd/zram-generator.conf:
|
|
Install systemd-zram-generator on Ubuntu 24.04+:
|
|
ZRAM compresses idle process memory pages, effectively doubling usable memory for burst workloads without the latency penalty of disk-backed swap.
Kernel and Sysctl Memory Hardening
The kernel’s virtual memory subsystem has several knobs that affect OOM behavior. These settings harden the host against runaway memory consumption:
|
|
vm.overcommit_memory=2 is the most impactful. When set to 2 (strict overcommit), the kernel refuses memory allocations that would exceed RAM × overcommit_ratio + swap. Applications that try to malloc more than available memory get a clean ENOMEM error instead of crashing into swap death and eventual OOM. Databases handle ENOMEM gracefully; random OOM kills do not.
Verification — Confirm All Protections Are Active
After applying everything, verify your setup:
|
|
For Prometheus monitoring, add the node_memory_oom_kills_total metric to your Grafana dashboard. Any value above zero means you need to tighten limits or add RAM.
Summary — Five Layers of OOM Defense
| Layer | Tool | What It Does |
|---|---|---|
| 1 | Docker deploy.resources.limits.memory |
Caps per-container memory, triggers in-Docker OOM |
| 2 | systemd-oomd | Kills entire cgroups at 60%+ memory pressure |
| 3 | earlyoom | SIGTERM-first preemptive kill below 5% free memory |
| 4 | oom_score_adj | Biases kernel choices toward non-critical containers |
| 5 | Kernel sysctls | Prevents overcommit, reserves memory, limits swap eagerness |
A Docker homelab host without memory limits is one stress-ng away from an unrecoverable freeze. Apply these configurations once, and your Postgres databases, Prometheus instances, and reverse proxies stay running even when a container goes rogue.
The kernel OOM killer is a last resort, not a memory management strategy.