ZFS on Proxmox is not set-and-forget. The defaults work, but they
optimize for nobody. A pool you built with zpool create tank /dev/sdb /dev/sdc is likely running in a suboptimal configuration
— wrong ashift, default recordsize, uncompressed data, and an
ARC sized for a desktop, not a hypervisor.
This post covers every layer of ZFS tuning that matters for a Proxmox homelab: pool topology, creation-time parameters, dataset properties, ARC sizing, and the SLOG/L2ARC decision. Every command works on Proxmox VE 9.x / Debian 12 with OpenZFS 2.3+.
Step 1 — Pool Topology: Mirrors vs RAID-Z
This is the single most impactful decision you will make. It determines your IOPS, your capacity, your rebuild speed, and your resilience profile. There is no universally correct choice.
Mirror VDEVs — IOPS King
A mirror vdev group (two or more disks, each storing a full copy) gives you the best random IO performance. Reads can come from any member. Writes go to all members simultaneously and complete when the slowest finishes.
|
|
Good for:
- VM disk storage (random 4K IOPS matter here)
- Database workloads (PostgreSQL, MySQL)
- Anything latency-sensitive
Bad for:
- Capacity efficiency (you lose 50% of raw space)
- Very large pools on a budget
Rebuild speed: Fast. ZFS copies data from the surviving mirror member. A 4 TB drive resilvers in 2–4 hours on a loaded host.
RAID-Z VDEVs — Capacity Efficient
RAID-Z distributes data and parity across all disks in a single vdev. RAID-Z2 (double parity) is the sweet spot for homelabs.
|
|
Good for:
- Bulk media storage, ISO libraries, backups
- Sequential write workloads
- Maximizing usable space per dollar
Bad for:
- Random IOPS (bottlenecked by a single parity group)
- Mixed VM workloads (latency spikes under contention)
Rebuild speed: Slow. Every block must be reconstructed from parity across all surviving disks. A 4 TB drive in a 6-wide RAID-Z2 can take 12–24+ hours.
The Hybrid Approach
Many Proxmox homelabs run two pools:
|
|
Register each as a separate storage in Proxmox (Datacenter → Storage → Add → ZFS). This lets you pin each VM and CT to the
right tier.
Step 2 — ashift: The Most Common Mistake
ashift controls ZFS’s logical sector size. The default is 0
(auto-detect), which reads the physical sector size reported by
the drive. The problem: almost every SSD and many modern HDDs
lie and report 512 bytes when their real sector size is 4 KB
or 8 KB.
Running ashift=9 (512 bytes) on a 4 KB-native drive causes ZFS to issue 8× more IOs than needed, thrashing the drive with partial sector writes. Performance loss is 20–50% on random writes.
What ashift Value to Use
| Drive type | Recommended ashift | Notes |
|---|---|---|
| Modern NVMe (Samsung PM9A3, Kioxia CD8, etc.) | 13 (8 KB) | Most enterprise NVMes use 8 KB pages |
| Consumer NVMe (Samsung 990 Pro, WD SN850X) | 12 (4 KB) or 13 | Test both — check with nvme id-ns |
| SATA SSD (870 Evo, MX500) | 12 (4 KB) | Universally correct |
| SATA HDD, Advanced Format (≥2011) | 12 (4 KB) | All modern HDDs use 4 KB sectors |
| Legacy HDD (pre-2010, 512 byte native) | 12 (4 KB) still safe | Minor overhead, enables future drive swaps |
Set ashift at pool creation time. It is immutable afterward.
|
|
Verify After Creation
|
|
If you built a pool without setting ashift and suspect it is wrong, unfortunately the only fix is to destroy and recreate. Backup your data, destroy the pool, recreate with the right ashift, and restore.
Step 3 — Compression: Always Enable lz4 (or zstd)
Compression in ZFS is essentially free on modern CPUs. LZ4 can compress at 1–2 GB/s per core. Enabling it reduces storage usage, lowers write amplification on SSDs, and frequently improves read latency because less data is fetched from disk.
|
|
lz4 vs zstd
| Algorithm | Speed | Ratio | When to use |
|---|---|---|---|
| lz4 | 3–4 GB/s per core | 2–3× on text/logs | Default everywhere |
| zstd-1..3 | 1–2 GB/s per core | 2–4× | Media metadata, container images |
| zstd-6..9 | 200–500 MB/s per core | 3–5× | Cold storage, backups (save space, accept CPU cost) |
| zstd-10..19 | <100 MB/s per core | Up to 6× | Archival only |
| gzip-9 | Slow | Comparable to zstd-3 | Legacy — use zstd instead |
For a homelab running VMs, databases, and media:
|
|
Real-World Compression Ratios
|
|
Typical numbers:
- VM disk images (qemu raw/img): 1.1–1.3×
- ISO backups: 1.0× (already compressed)
- Docker overlay2 data: 1.8–2.5×
- PostgreSQL databases: 1.5–2.0×
- Logs and config files: 3–5×
Step 4 — recordsize and volblocksize
ZFS issues IOs in chunks called records (for datasets) and blocks (for zvols). The size of these chunks has a massive impact on performance.
Dataset recordsize (for file storage)
Use this for SMB/NFS shares, backup directories, and media stores.
|
|
zvol volblocksize (for VM block storage)
This is the size ZFS uses when the VM issues a write. Proxmox VM disks are zvols by default. Mismatching this to the guest workload is the #2 ZFS performance mistake (after ashift).
|
|
Set it at zvol creation. Unlike dataset recordsize, you cannot efficiently change volblocksize after data exists.
To check the volblocksize of an existing VM disk:
|
|
If you want to change it, the practical approach in Proxmox is:
- Shut down the VM
- Create a new zvol with the desired volblocksize
ddthe old zvol to the new one (or useqemu-img convert)- Detach the old disk, attach the new one
Step 5 — ARC Sizing
ZFS uses Adaptive Replacement Cache (ARC) as an in-memory read cache. It competes with your VMs and containers for RAM.
How the Default Works
By default, ZFS will consume up to 50% of system RAM for the ARC. On a 64 GB host, that is 32 GB of RAM locked by ZFS, leaving only 32 GB for VMs and the OS.
Tuning ARC max
|
|
How Much ARC Is Enough?
General guidance for Proxmox:
| System RAM | Suggested ARC max | Notes |
|---|---|---|
| 8 GB | 1 GB | Bare minimum — monitor arcstats |
| 16 GB | 2–4 GB | Good for 2–3 light VMs |
| 32 GB | 4–8 GB | Good for 4–6 VM/CT workloads |
| 64 GB | 8–16 GB | Typical Proxmox single node |
| 128 GB | 16–32 GB | Heavy VM density |
| 256 GB+ | 32–64 GB | ARC will grow but VMs matter more |
Monitor ARC Effectiveness
Install and run arcstat:
|
|
If hit% is below 85% and you have free RAM, increase ARC.
If dhit% is above 95%, your ARC is doing its job. Decreasing it
may free RAM for VMs without a significant performance penalty.
Step 6 — SLOG and L2ARC: When and Whether
SLOG (Separate ZFS Intent Log)
A SLOG is a dedicated NVMe device for synchronous write
operations. It does not cache reads. It only absorbs the ZIL
(ZFS Intent Log) to accelerate fsync() and O_SYNC writes
from databases and journaling filesystems.
You need a SLOG if:
- Your main pool is HDD-based and you run databases inside VMs
- You see high
zil_commitlatency inzpool iostat -l 1 - Your pool has NVMe but you want to isolate ZIL traffic
You do NOT need a SLOG if:
- Your pool is all-NVMe (native NVMe latency for ZIL is already <100 µs)
- You run no sync-heavy workloads (media servers, file shares)
Hardware requirements for SLOG:
- Power-loss protection (PLP) — mandatory. Consumer NVMe drives lie about flushing writes. If the SLOG lies about a write and power fails, your pool is corrupt.
- Use enterprise NVMe: Intel Optane (best — sold as “Intel Optane Memory H10/H20” on eBay), Samsung PM9A3, Kioxia CD6, or any NVMe with a supercapacitor.
- Mirror it. A single SLOG is a single point of failure.
|
|
SLOG sizing: The ZIL is a small ring buffer. 10–20 GB per SLOG device is more than enough for any homelab.
L2ARC (Level 2 ARC)
L2ARC is a read cache on a secondary device (usually cheaper SSD/NVMe). Data evicted from ARC is written to L2ARC.
When it helps:
- You have a large HDD pool with a small ARC (e.g., 8 GB ARC for 40 TB of HDD storage)
- Your working data set is larger than ARC but fits on a single SSD
When it does NOT help:
- Your ARC hit rate is already >90%
- Your pool is all-NVMe (ARC on NVMe is already fast)
- Your workload is write-heavy (L2ARC is read-only for data)
|
|
Opinion: Most homelabs do not benefit from L2ARC. The ARC hit rate on a 16 GB ARC serving a few VMs is usually above 95%. Add more RAM instead. L2ARC only makes sense when you have a large pool with insufficient ARC and cannot add RAM.
Step 7 — Dataset Properties for Common Homelab Workloads
This table summarizes the recommended zfs set properties for
different use cases:
| Workload | recordsize | compression | atime | special notes |
|---|---|---|---|---|
| VM disks (zvol) | 8K (volblocksize) | lz4 | off | zfs set primarycache=metadata for DB VMs |
| Docker overlay2 | 64K | lz4 | off | xattr=sa for extended attr perf |
| Samba/NFS media | 128K | lz4 | off | aclmode=passthrough for NFSv4 ACLs |
| Backup target | 1M | zstd-3 | off | Sequential streaming benefits from large records |
| ISOs/Templates | 128K | lz4 | off | Read mostly, cache trivial |
| PostgreSQL data dir | 8K (zvol) | lz4 | off | logbias=throughput to bypass ZIL on async |
| Proxmox container root | 64K | lz4 | off | Container filesystems use small blocks |
| Samba Time Machine | 128K | lz4 | off | dnodesize=legacy for macOS compat |
Set these immediately after creating the dataset or zvol:
|
|
Step 8 — Monitoring ZFS Health and Performance
ZFS ships excellent tools. Use them weekly.
Pool Health
|
|
IO Statistics
|
|
Scrubs — Schedule Them
A scrub reads every block and verifies checksums. It is the only way to detect and repair silent data corruption.
|
|
Proxmox schedules a monthly scrub by default. Verify:
|
|
If not present, add a weekly systemd timer:
|
|
arcstat Quick Reference
|
|
Key columns:
read— total ARC reads per secondhits— reads satisfied from ARCmiss— reads that went to diskhit%— overall hit ratiodhit%— demand data hit ratio (ignore prefetch)l2hits/l2miss— L2ARC hits and misses (useful only if you have L2ARC)
Putting It All Together: A Complete Example
Here is a real configuration for a 64 GB Proxmox host with 4 × 1 TB NVMe (VM pool) and 6 × 4 TB HDD (bulk storage):
|
|
Summary
ZFS is the best storage filesystem for Proxmox, but only if you configure it intentionally. The checklist from this post:
- Choose topology — mirrors for IOPS, RAID-Z for capacity
- Set ashift at pool creation — 12 for SATA, 13 for NVMe
- Enable compression — lz4 everywhere, zstd on backups
- Match recordsize to workload — 8K for VMs, 64K for Docker, 128K for media, 1M for backups
- Size the ARC — 1 GB per 8 TB of pool, or whatever your VM RAM budget allows
- Skip SLOG on all-NVMe pools — add one only for HDD pools with sync-heavy VMs
- Monitor weekly —
zpool status,arcstat,zpool iostat
A properly tuned ZFS pool will outlast the hardware it runs on, migrate seamlessly between Proxmox nodes, and never silently corrupt a byte of data. That is the point of running ZFS.