If you started your homelab on a single Proxmox host, you already know the pain: planned maintenance means shutting everything down, a hardware failure takes out every VM and container, and there is no live migration without a second node.
Adding a second Proxmox host is the natural next step, but a two-node cluster comes with its own challenges — especially around quorum and split-brain prevention. This guide walks through building a Proxmox VE cluster from scratch with high availability, using a QDevice as the tiebreaker vote to solve the two-node quorum problem, all while keeping the setup homelab-practical.
Why Cluster Proxmox — Live Migration and HA Failover
Clustering Proxmox nodes unlocks three major capabilities you cannot get on standalone hosts:
- Live migration — move running VMs between nodes with zero downtime
- Unified management — single web UI across all nodes with cluster-wide resource views
- High availability — automatic VM failover when a node goes offline
Proxmox clustering uses corosync for membership and quorum communication, and pmxcfs (the Proxmox Cluster File System) to synchronise configuration across all nodes in real time.
Prerequisites and Planning
Before creating the cluster, make sure you have:
- Two or more Proxmox VE 9.x hosts — all running the same major version
- Shared storage — NFS, iSCSI, Ceph, or ZFS over SSH for live migration
- Dedicated cluster network — a separate VLAN or physical link for corosync traffic (recommended)
- A QDevice host — a lightweight Debian machine (RPi, LXC container, or tiny VM) that will act as the quorum tiebreaker
- Root SSH access to all nodes and the QDevice host
For the rest of this guide, I will use these hostnames and IPs:
| Host | Role | IP |
|---|---|---|
proxmox01 |
Cluster node 1 | 10.0.20.30 |
proxmox02 |
Cluster node 2 | 10.0.20.31 |
pve-qdevice |
QDevice (LXC) | 10.0.20.10 |
Creating the Proxmox Cluster
On the first node, initialise the cluster:
|
|
This creates a corosync configuration, generates the necessary certificates, and starts the cluster services. Verify it worked:
|
|
On the second node, join the cluster using the IP of the first node and the cluster secret:
|
|
The --force flag bypasses the hostname conflict check. If you set unique hostnames during installation (which you should), you can omit it.
After both nodes are joined, confirm the cluster status:
|
|
Cluster Networking Best Practices
Corosync traffic should be isolated from regular VM and storage traffic. If your hosts have multiple NICs, configure a dedicated cluster network:
|
|
Edit /etc/pve/corosync.conf to set the bindnetaddr:
|
|
Verify corosync link health:
|
|
If you see “no faults”, the cluster link is healthy. Any faults here indicate network issues that will cause HA failures later.
Setting Up Shared Storage
VMs cannot live-migrate without shared storage. The simplest approach for a homelab is an NFS export from a NAS or a dedicated VM.
Add the NFS storage via the web UI (Datacenter → Storage → Add → NFS) or /etc/pve/storage.cfg:
|
|
For homelabs without dedicated NAS hardware, ZFS over SSH replication is a solid alternative — configure a ZFS dataset on each node and use Proxmox’s built-in replication scheduler. The trade-off is that replication is not real-time; typical intervals are 5–15 minutes.
Solving the Two-Node Quorum Problem with QDevice
A two-node cluster has a fundamental problem: if one node fails, the surviving node holds only 1 out of 2 votes — not enough to reach quorum. This triggers corosync to fence (reboot) the surviving node, taking down every VM.
The fix is an external QDevice — a lightweight third vote that does not run any VMs. It lives on a separate host and votes alongside the cluster nodes.
Step 1 — Prepare the QDevice Host
I run my QDevice as a Debian LXC container on a third Proxmox host (or the same hardware, on a different physical link). Install the required package:
|
|
The QDevice host needs SSH root access from the cluster nodes. Generate a key on the first cluster node and copy it:
|
|
Step 2 — Add the QDevice to the Cluster
From any cluster node, run:
|
|
This pushes the cluster certificate to the QDevice, configures corosync-qdevice on both sides, and starts the service.
Step 3 — Verify Quorum
|
|
You should now see three votes available:
Quorum information:
Votes: 2
Expected votes: 3
Quorum: 2/3
The QDevice does not appear in the Proxmox web UI, but pvecm status and corosync-quorumtool -s confirm it is active. The cluster now tolerates a single node going offline — 2 out of 3 votes still form quorum.
Enabling High Availability
With quorum solved, enable the Proxmox HA stack:
|
|
Run this on every node in the cluster.
Fencing
Fencing (STONITH) ensures that a non-responsive node is forcefully rebooted before the cluster tries to recover its VMs. Proxmox uses watchdog timers for self-fencing.
On each node, check the watchdog:
|
|
If no hardware watchdog is available, the softdog kernel module provides a software watchdog:
|
|
With fencing in place, a failed node will be fenced within 60 seconds, and the surviving node will restart the impacted VMs.
Creating HA Groups and Resources
HA groups control the order and preferred nodes for VM placement.
Create an HA Group
Via the CLI:
|
|
--type ordered— prefer the first-listed node--nofailback 0— allow failback when the preferred node returns
Add VMs to HA
|
|
For containers, use ct: instead of vm::
|
|
Monitor HA Status
|
|
From the web UI: navigate to Datacenter → HA → Groups to manage groups visually, or Datacenter → HA → Resources to add and watch managed VMs.
Testing HA Failover
Do not skip this step. A cluster is only as trustworthy as the testing you put it through.
Graceful Failover Test
Stop corosync on the active node:
|
|
On proxmox02, the HA manager will detect the loss of quorum for proxmox01, wait for the watchdog to fence it, then start VMs:
|
|
Within 60–120 seconds, you should see:
VM 100 proxmox02 started
CT 200 proxmox02 started
To bring the node back, start corosync:
|
|
The node rejoins automatically. If nofailback is enabled, VMs stay on proxmox02 until you manually migrate them back.
Hard Fencing Test
For a more realistic test, pull the power (or unplug the network) on one node. The other node should fence and recover VMs within the watchdog timeout window.
Production Considerations for Homelab Clusters
A few things to keep in mind as you move beyond the basic setup:
- Three nodes eliminate the QDevice need entirely. With three Proxmox hosts, you can run Ceph as native shared storage and have natural quorum (2/3 votes). This is the ideal homelab topology if you have the hardware.
- Run the QDevice as an LXC on a third Proxmox host to avoid adding a separate physical machine. Create a Debian 13 unprivileged container with nesting enabled and install
corosync-qdeviceinside it. This is what I do, and it works flawlessly. - Backups are still mandatory. High availability handles node failure, not filesystem corruption, accidental
rm -rf, or data loss. Pair your HA cluster with Proxmox Backup Server for scheduled backups. - Cluster filesystem replication. pmxcfs synchronises
/etc/pve/across all nodes — includingcorosync.conf, storage configs, user permissions, and VM configs. Any change on one node propagates instantly.
Conclusion
Building a Proxmox VE cluster with high availability in your homelab is not as complicated as it sounds. The core stack — corosync for membership, QDevice for quorum, watchdog for fencing, and the HA manager for recovery — has been battle-tested in production environments for years. With two modest Proxmox hosts, a cheap Raspberry Pi (or a spare LXC), and shared storage, you get enterprise-grade HA at homelab prices.
Start with one cluster, enable HA on your most critical VMs, run the failover test, and then expand from there.