Your Proxmox host has NVMe SSDs, a ZFS pool tuned to the gills, and plenty of RAM. But your VMs still feel sluggish under disk-heavy workloads — database commits take too long, file transfers stall, and iowait climbs uncomfortably high.
The default Proxmox VM configuration is safe but leaves significant performance on the table. A few targeted tweaks to the disk controller, iothread configuration, cache settings, and async IO engine can double or triple your VM’s disk throughput without spending a cent on hardware.
This guide walks through each tuning knob in order of impact: controller selection → iothreads → cache mode → discard/trim → async IO → benchmarking. Apply the ones that match your storage layout, measure results, and keep what works.
Choosing the Right Disk Controller — virtio-blk vs virtio-scsi
Proxmox offers two virtio-based disk controller families for Linux guests. The choice directly affects queue depth, parallelism, and overall throughput.
virtio-blk is the simpler option. It is baked into the Linux kernel since 2.6.24, requires zero extra drivers, and introduces minimal overhead. By default it exposes a single I/O queue. That single queue limits parallelism when multiple processes or threads hit the disk simultaneously. Starting with QEMU 6.0 and Proxmox 8, virtio-blk supports multi-queue via the num-queues property, which matches the number of vCPUs assigned to the VM.
virtio-scsi is the more modern choice. It supports multiple I/O queues out of the box, a default queue depth of 256 (versus 128 for virtio-blk), and better integration with SCSI-3 persistent reservations. The virtio-scsi-single controller model creates one IOThread per disk — ideal for workloads with concurrent access patterns.
When to use each:
| Criterion | virtio-blk | virtio-scsi |
|---|---|---|
| Overhead | Lowest | Very low |
| Default queue depth | 128 | 256 |
| Multi-queue | Manual (num-queues=N) |
Built-in |
| Best for | Single-process, low-concurrency workloads | Databases, file servers, multi-threaded apps |
| Pass-through features | Limited | Full SCSI support, TRIM pass-through |
For most homelab VMs running Docker, databases, or file storage, virtio-scsi-single is the better default. For minimal-overhead single-user VMs (e.g., a build runner), virtio-blk with multi-queue is fine.
|
|
Check your current controller with:
|
|
IOThreads — Dedicated I/O Processing Pins
By default, QEMU handles disk I/O on the main vCPU thread pool. When a heavy write operation blocks, it steals cycles from the guest’s compute. IOThreads carve out dedicated threads for I/O processing, isolating storage latency from CPU-bound workloads.
Enable iothreads per disk:
|
|
Then verify the iothread is running on the host:
|
|
For maximum benefit, pin iothreads to physical cores that are not shared with vCPUs. On a 6-core Proxmox host, a sensible layout reserves cores 4-5 for iothreads:
Edit /etc/pve/nodes/<node>/qemu-server/<VMID>.conf and add after the iothread lines:
cpuunits: 1024
cpulimit: 4
affinity: 0-3
Then pin iothread processes via taskset:
|
|
Or use a systemd service or crontab @reboot script for persistence. In Proxmox 8.1+, CPU pinning for iothreads can also be set through the GUI under the VM’s hardware options.
Disk Cache Mode Comparison
The disk cache mode controls how QEMU interacts with the host page cache and how it reports write completion to the guest. Proxmox offers five modes — in a homelab with ZFS, only two matter:
cache=none — the Proxmox default since version 2.x. QEMU opens the disk with O_DIRECT, bypassing the host page cache entirely. The guest sees a writeback cache (so it sends flushes when needed), and ZFS handles all caching through its ARC. This avoids double-caching and is the recommended mode when the backing storage is ZFS.
cache=writeback — data lands in the host page cache before being written to disk. Writes complete faster from the guest’s perspective, but the host page cache adds another layer. On hardware RAID with battery-backed write cache (BBU), this is fine. On ZFS without a UPS, a power loss can lose data sitting in the host page cache.
ZFS recommendation: Use cache=none. ZFS ARC already caches frequently accessed data in RAM. Adding the host page cache on top wastes memory that ARC could put to better use.
|
|
Enabling Discard / TRIM for Thin-Provisioned Storage
If you use thin provisioning on ZFS datasets or qcow2 images, discarded blocks in the guest should be forwarded to the host so ZFS can reclaim space. Without discard, deleted files inside the VM leave allocated blocks on the host forever.
Enable discard on the Proxmox disk:
|
|
Inside the guest (Linux), run an immediate trim and enable the periodic timer:
|
|
For Debian/Ubuntu guests, the fstrim.timer runs weekly by default. On Proxmox hosts using ZFS, verify that discard is freeing space:
|
|
A non-zero freeing value means blocks are actively being reclaimed.
Async I/O Engine Selection — threads vs native
QEMU can dispatch disk I/O using two engines:
- threads (default) — spawns a POSIX thread per I/O request. Simple and compatible with all storage backends, but context-switching overhead shows at high IOPS.
- native — uses Linux kernel AIO (
io_submit/io_getevents). Lower overhead per I/O, better for high-queue-depth workloads.
Switch to native AIO on ZFS or raw block devices:
|
|
Caveat: Some older Linux kernels (pre-5.4) and Ceph RBD backends have issues with aio=native. If your host kernel is 5.15+ (standard on Proxmox 8.x and 9.x), native AIO is safe and recommended.
Benchmarking Disk Performance — fio and qemu-img
Never apply tuning blindly. Measure before and after with real workloads.
Inside the guest — fio is the standard benchmark tool:
|
|
On the host — benchmark the raw block device through QEMU’s block layer:
|
|
Key metrics to track:
- 4K random IOPS — the single best indicator of real-world VM performance
- 99th percentile latency — matters more than average for databases
- Sequential throughput — important for file transfers and backups
A tuned VM on consumer NVMe (e.g., Samsung 990 Pro) through Proxmox with ZFS should hit 150k-200k+ random read IOPS with sub-millisecond p99 latency.
Putting It All Together
For a production homelab VM on ZFS-backed storage, here is the complete tuning command:
|
|
| Setting | ZFS (recommended) | Raw NVMe | Ceph/RBD |
|---|---|---|---|
| Controller | virtio-scsi-single | virtio-blk (multi-queue) | virtio-scsi-single |
| Cache mode | none | writeback | none |
| IOThread | yes | yes | yes |
| Discard | on | on | off |
| aio | native | native | threads |
Test each change independently on your actual workloads. Storage topology, CPU count, and RAM all interact with these settings. There is no universal “fastest” config — only the best config for your specific workload on your specific hardware.
For deeper reading, see the Proxmox VE Performance Tweaks Wiki and the QEMU block layer documentation.