Disk I/O is the most common bottleneck in a homelab. A misconfigured IO scheduler can add 5–10 ms of latency to every write. Wrong mount options silently double metadata writes. And one noisy container can starve every other service on the box.
Most Linux distributions ship conservative defaults that work on anything from a Raspberry Pi to a 48-bay storage server. That means they are tuned for nobody. Your homelab deserves better.
This guide covers the three layers of disk I/O tuning that matter:
- IO schedulers — which one for NVMe, SATA SSD, and spinning rust
- Mount options —
noatime,commit,discard, andnobarrier - cgroup IO throttling — taming container I/O with cgroup v2
Every command and config here works on Debian 12, Ubuntu 24.04, Proxmox VE 9.x, and the current Linux 7.x kernel series.
Step 1 — Understanding Your IO Scheduler
The IO scheduler sits between the block layer and the storage driver. Its job: reorder, merge, and dispatch I/O requests to maximize performance. The right choice depends entirely on your hardware.
Check What You Are Running
|
|
The bracket shows the active scheduler. Modern Linux on NVMe defaults
to none (noop), which is correct. SATA drives often default to
mq-deadline. Both are sane, but not always optimal.
Which Scheduler When
| Hardware | Recommended Scheduler | Why |
|---|---|---|
| NVMe SSD (datacenter grade) | none |
NVMe controllers have native command queuing. Adding a scheduler just burns CPU. Leave it alone. |
| NVMe SSD (consumer / DRAM-less) | mq-deadline |
These benefit from write-merging. Test both. |
| SATA SSD (Samsung 870, MX500, etc.) | mq-deadline or kyber |
Deadline gives latency guarantees. Kyber is leaner. |
| SATA HDD (spinning disks, bulk storage) | bfq |
BFQ provides fair I/O bandwidth sharing. Crucial for multi-VM hosts sharing one HDD. |
| RAID controller (HBA, LSI, etc.) | none |
The controller handles ordering. Schedulers add overhead. |
Changing the Scheduler
Temporary change (live, no reboot):
|
|
Permanent change via udev rule:
|
|
The queue/rotational attribute is the kernel’s way to distinguish
HDDs (rotational=1) from SSDs (rotational=0). The udev rule applies
on boot and on hotplug.
Step 2 — Tuning Block Device Queue Parameters
Beyond the scheduler, every device exposes tunables under
/sys/block/<dev>/queue/. These four make the biggest difference:
nr_requests — Queue Depth
Controls how many I/O requests the block layer queues before throttling the application. Higher values improve throughput under load but increase per-request latency.
|
|
read_ahead_kb — Prefetch Window
How many kilobytes the kernel reads ahead on sequential access. Default 128 KB is conservative. For media storage and backup targets, increase it.
|
|
max_sectors_kb — Maximum I/O Size
Largest single I/O request the block layer will issue. Modern NVMe drives can handle 2 MB+ IOs. The default is often 1280 sectors (~640 KB). Raise it for sequential workloads.
|
|
nomerges — Disable I/O Merging
For NVMe drives where the native controller does merging better than the kernel, you can skip block-layer merging to save CPU.
|
|
Persist these through the same udev rule, or use a systemd tmpfiles d entry:
|
|
Step 3 — Filesystem Mount Options That Matter
Mount options are the highest-impact, lowest-effort tuning you can
do. These apply to ext4, xfs, btrfs, and zfs (in different
ways).
noatime — Skip Access Time Updates
Every file read used to write an atime update — a metadata write for
every read. noatime eliminates this. Safe to use everywhere unless
you rely on atime for mail spools or backup tools.
|
|
For databases or container storage, combine with nobarrier only if
you have battery-backed RAID or ZFS with a SLOG.
commit= — Dirty Writeback Interval
Controls how often the kernel flushes dirty pages to disk. Default is 5 seconds (ext4) or 30 seconds (XFS). Lower = better crash safety. Higher = better sequential throughput.
|
|
discard — Online TRIM
Enable for SSD/NVMe filesystems. Modern kernels use
discard=async (ext4) or discard (XFS), which queues TRIM
commands without blocking writes.
|
|
relatime — The Compromise
If you need POSIX-compliant access times for some reason, use
relatime. It updates atime only when the file was accessed more
recently than the last modification — vastly fewer writes than
default atime.
Every current Linux distribution defaults to relatime. If your
fstab uses defaults, check with mount | grep relatime — you
might already have it.
Putting It Together
|
|
Apply without reboot:
|
|
Step 4 — Monitoring I/O in Real Time
You cannot tune what you do not measure. These four commands will diagnose 90% of homelab I/O problems.
iostat — Aggregate Device Stats
|
|
ioping — Direct Latency Probing
ioping bypasses caches and filesystems to test raw device latency.
|
|
Target latencies:
- NVMe direct: 50–150 µs
- SATA SSD direct: 100–400 µs
- SATA HDD direct: 4–12 ms
- NFS over 1GbE: 200–600 µs
iotop — Per-Process I/O
|
|
blktrace — Deep Block-Level Tracing
For when you need to trace a specific I/O request through the block layer. Heavyweight, but irreplaceable for debugging.
|
|
Step 5 — Container I/O Throttling with cgroup v2
In a homelab running Docker, your databases, media servers, download clients, and monitoring containers all share the same disks. Without throttling, a runaway transcoder can tank database queries.
Checking cgroup Version
|
|
Docker Engine 25+ defaults to cgroup v2 on modern kernels. Proxmox 9.x also uses cgroup v2 by default.
Throttle by Bandwidth (Bytes Per Second)
|
|
Throttle by IOPS (Operations Per Second)
For SSDs where latency matters more than sequential throughput, cap by IOPS instead:
|
|
Verify Throttling Is Active
|
|
Use docker stats to confirm actual IO stays below the limit:
|
|
A Practical Throttling Strategy
For a typical homelab with one NVMe root drive and one HDD for storage:
|
|
Step 6 — Proxmox-Specific I/O Tuning
If you run Proxmox, the host handles I/O for every VM and container. Three settings matter most:
ZFS Recordsize
For VM storage on ZFS, set recordsize=64K (not the default 128K).
VM disk images use 4K–64K blocks. Larger recordsizes waste ARC memory
and amplify write latency.
|
|
KVM IO Threads
Pin IO threads to dedicated CPU cores to prevent vCPU scheduling from blocking disk operations:
|
|
LXC IO Limits
LXC containers support disk I/O throttling natively. Set it in the container resource tab or via CLI:
|
|
Verification Checklist
Run these benchmarks before and after tuning to confirm improvements:
|
|
Target numbers for a well-tuned NVMe on a modern Linux kernel:
- Sequential read: 3000–7000 MB/s (depends on drive)
- Random 4K write: 50K–200K IOPS
- Direct latency (ioping): <150 µs P99
Summary
Disk I/O tuning in a homelab is a three-act play:
- Pick the right IO scheduler for each device —
nonefor NVMe,mq-deadlinefor SATA SSD,bfqfor HDDs, and persist it with a udev rule. - Set mount options —
noatimeeverywhere,commit=to control writeback cadence,discard=asyncfor SSDs. - Throttle containers with cgroup v2
--device-*-bpsand--device-*-iopsto keep one noisy service from tanking the whole host.
The monitoring commands — iostat -x, ioping, iotop -oPa —
will tell you when something is wrong. The knobs in this post will
let you fix it.