Proxmox 9 - How To Setup High Availability and Automatic Rebalancing
In the video below, we show how to setup HA and automatic rebalancing for Proxmox version 9
One of the major benefits of hypervisors like Proxmox VE is they can provide High Availability
Now it does require a cluster and the nodes need access to some form of shared storage, but this automation can significantly reduce the downtime of guests
Typically it’s used to guard against hardware failures, but there is some form of software monitoring as well
Now I covered this in a previous video, but since then version 9.0 has brought affinity rules and as of version 9.2 you can take advantage of dynamic load balancing
Useful links:
https://pve.proxmox.com/wiki/High_Availability
Assumptions:
Now because this video is specifically about High Availability, I have to make some assumptions otherwise the video would be much longer
First, I’m going assume you already have a Proxmox cluster, or you at least know how to setup one
If not then I do have another video that covers that
Secondly, I’m going to assume the nodes can all access some form of shared storage
If not then I do have another video that covers NFS storage
In addition, I have another video covering ZFS replication which is really useful if your cluster only has two nodes
Resources:
The first thing we need to do is to let the cluster know what guests we want to provide HA to
To do that, navigate to Datacenter | HA
Then under the Resources section click Add
From the drop-down menu labelled VM, select the resource you want to add
Despite the name, this can be a virtual machine or a Linux container
TIP: You can type in the field to filter guests out
Now for some that might be enough and you can just click Add and be done, but there are other settings here to know about
In additon, be aware that if a VM is currently offline and you add it as a resource to HA, you’ll shortly see it power it on
Now typically HA is used to make sure a VM for instance is up and running, although there are other possible states you can set
So I’ll explain these other settings based on the expectation you want the VM or container to be running
First we have Max. Restart which refers to how many attempts HA should take to try and restart a guest on it’s existing node
For example, let’s say a VM is assigned to node 1 and it’s up and running, but it then goes offline
HA will see a state of stopped and based on the default setting of 1, it will make one attempt to start the VM up on node 1 before giving up and migrating it to another node
In other words, maybe a software exception caused the outage and a power up will fix it. If not, maybe it’s a hardware problem so we’ll migrate it
Max. Relocate refers to how many times HA should attempt to migrate a guest to another node
For example, let’s say a VM is assigned to node 1 and it’s up and running, but then node 1 goes offline
HA will detect this and based on the default setting of 1, it will make one attempt to migrate the VM to say node 2 and power it up on that server
But if the process fails, it won’t try and migrate the VM to say node 3
What’s crucial here though, is both node 1 and node 2 must have access to that VM’s hard drive(s) otherwise node 2 won’t be able to run that VM
Next up we have a setting called Failback
Now in previous versions this was called Nofailback, so this newer wording is easier to understand
By default this is enabled and it can actually cause you more problems if you use what used to be called HA Groups but as you’ll see later are now known as Node Affinity Rules
For example, let’s say node 1 is the preferred node to run your VM on and node 2 is the other choice
Well if node 1 goes offline, then HA will detect this and migrate the VM to node 2
It turns out node 1 has an intermittent power problem and without intervention it comes back online
If Failback is enabled, HA will migrate your VM back to node 1, only for node 1 to reset shortly after and the cycle will repeat
That is a really bad situation to be in because users of the service on that VM, will experience intermittent outages
Personally I prefer to de-select this feature and manually migrate VMs back to where they should be once the real problem is resolved because it means users will only experience that brief initial outage
A new option available is Auto-Rebalance and this relates to the load balancing made fully available in version 9.2 which we’ll cover later
It’s enabled by default and when it is it allows HA to include the guest in automated load balancing decisions
That might sound great at first but bear in mind even live migrations can be disruptive so you may want to de-select this option especially for critical services like database
Typically, as I hinted before, the goal of HA is to keep a guest up and running and so by default Request State is set to started
TIP: If a guest is currently turned off and you don’t want it immediately started you’ll probably want to set this to stopped or ignored because with a setting of started, HA will attempt to power it on shortly after you click Add
The stopped state is to tell HA that a guest should be turned off, and if necessary relocated
The ignored state is more for maintenance windows when you don’t want HA to interfere, even if a failure is later detected
Whereas the disabled state is to tell HA that a guest should be turned off, but not relocated. This is also the only way to move a resource out of the error state
Finally we have a comment field which can actually be useful and populates the Description column in the GUI
For one thing the name of a guest may not have meaning, but in any case, the GUI usually shortens the name column whereas the Description column is the final column and tends to have more room to display its contents
To create the resource, the last thing to do is to click Add
Node Affinity Rules:
In order for HA to protect guests against node failures, it requires rules to be defined
These let HA decide which node in the cluster a guest should be migrated to
Interestingly enough there’s one already in place, even if you do nothing else
And this default rule is to pick the node with the least number of active guests
For example, if a VM is running on node1 and HA is monitoring it, then if node1 goes offline, HA will take action by migrating and starting that VM on another node in the cluster
The only problem is it may not be a node you want it to be running on and so you can set your own rules to sway the decision
In earlier versions of Proxmox VE we have HA Groups for this
Since version 9.0, affinity rules have taken their place
If you still want tight control over which nodes a guest can be run on then you’ll want to use node affinity rules which are basically a renaming of HA Groups
These type of rules make a lot more sense in small clusters and they give you full control over the decision making
They still have their place in large clusters but when you have a lot of guests and/or nodes this sort of decision making can get very complicated
In any case, to create node affinity rules you’ll want to navigate to Datacenter | HA | Affinity Rules
Then under the HA Node Affinity Rules section click Add
By default the rule will be enabled, so you may want to de-select this if you want to prevent HA automatically moving guests shortly after the rule is created
From the HA Resources drop-down menu you can select from the resources you created earlier and you can choose multiple resources that this rule should apply to
Next you’ll need to decide which nodes this rule applies to and what priority each node should be given
For example, let’s say you have node1, node2 and node3 in your cluster and you’d prefer a VM to run on node1, but if that’s not available then on node3
In that case you would tick the boxes next to node1 and node3, to select them for the rule, and assign a priority of say 20 to node1 and 10 to node3
In other words, node1 has a higher priority, so is the preferred node to run this VM on
One thing to bear in mind is that there is an option called Strict and by default it’s not enabled
This means that even though the rule has only node1 and node3 selected, node2 could still be used as a last resort
But if Strict is enabled, node2 cannot be used by HA at all
Finally we have a comment field
To create the rule, the last thing to do is to click Add
Resource Affinity Rules:
Resource affinity rules brings Proxmox VE more in line with hypervisors like VMware’s ESXi
Instead of focusing on which nodes guests should run on, theses rules are about keeping guests together or apart
And in large clusters in particular this approach can make much more sense
To create resource affinity rules you’ll want to navigate to Datacenter | HA | Affinity Rules
Then under the HA Resource Affinity Rules section click Add
By default the rule will be enabled, so you may want to de-select this if you want to prevent HA automatically moving guests shortly after the rule is created
From the HA Resources drop-down menu you can select from the resources you created earlier and you can choose multiple resources that this rule should apply to
NOTE: You cannot mix and match affinity rules. For example, you can’t apply node affinity AND resource affinity rules to the same guest. You can only use either or. So while the GUI may let you pick a guest, if it already has a rule applied to it, you’ll get an error message when you try to add the rule
Next you’ll need to choose an Affinity of either Keep Together or Keep Separate
As an aside, I really like this wording over the more traditional use of positive affinity and negative affinity as it’s much easier to understand the purpose
In any case, let’s say you have a service made up of a web server, application server and database and these are run in separate VMs
For maximum performance you’ll want to run these on the same node, so you would create a rule containing these three VMs and set the Affinity to Keep Together
There is no choice over which node in the cluster they will run on, but HA will aim to keep them together. So they will be migrated as if they were a group
In contrast, let’s say you have a pair of virtual firewalls and these have their own HA solution
It still makes sense to protect them against a node failure, so you would create a rule containing these two VMs and set the Affinity to Keep Separate
Again, there is no choice over which nodes in the cluster these will run on, but they will be kept apart, meaning a node failure won’t take out both firewalls in one go
Finally we have a comment field
To create the rule, the last thing to do is to click Add
Cluster Resource Scheduler:
The cluster resource scheduler or CRS was actually available for quite a while but it was classed as a technology preview
As of version 9.2 we can now even take advantage of dynamic load balancing of resources and this is all fully supported
Now it’s not immediately obvious that this is available so you’ll need to navigate to Datacenter | HA
In the top right hand corner you’ll find a button labelled CRS Settings and you’ll want to click that to set things up
By default the Scheduling Mode is set to Default (basic) also shown as Basic (Resource Count) in the drop-down menu
This mode is what I referred to earlier where HA will choose a node with the least number of active guests if a node fails for instance and a decision is needed as to where to migrate guests to
A more intelligent choice would be Static Load which looks at the hardware configuration of guests to make a more accurate decision based on the CPU and RAM allocations
If you want to really push the boundaries though you can select Dynamic Load which actively monitors those CPU and RAM loads
Bear in mind, CRS still has to take into account any affinity rules you’ve created and that could restrict the choices available
Now one really interesting feature here, to me at least, is Rebalance on Start
Enabling this allows CRS to migrate a VM to another node before starting it
For someone like myself, this is ideal because it can be tricky to use the API to create new VMs using Ansible for instance
The simplest thing you can do is to target a single node in the cluster otherwise you get errors if Ansible loops through them all
By enabling this, you can still create all of the VMs on a single node, but the cluster will then decide which VM to start them on
And that alone is a major benefit to me
Automatic Rebalance:
Now the pièce de résistance here is automatic rebalancing of resources, if you feel brave enough to use it that is
NOTE: At the time of recording, this only applies to resources that HA has been enabled for
This brings Proxmox VE much closer to what VMware’s ESXi is capable of in terms of HA
You’ll first have to set the Scheduling Mode to either Static Load or Dynamic Load
Only then can you take advantage of Automatic Rebalance
With this enabled, CRS will monitor the cluster for resource utilisation according to the mode you’ve chosen
And it may take action to live migrate guests to better balance the resources depending on a number of settings
The Imbalance Threshold has a default value of 30% and it refers to how imbalanced active guests are across the cluster need to be to warrant migrating any of them
TIP: Once enabled you’ll see a value for the mode and load imbalance in the Status section. This is in the line referencing the master node
The Rebalancing Method has a choice of Bruteforce and TOPSIS
The default option is Bruteforce where it weighs up CPU and RAM as equally important
TOPSIS on the other hand prioritises RAM over CPU
The Hold Duration has a default value of 3 and this refers to the number of HA manager rounds, each of which lasts about 10 seconds, meaning this is roughly a 30 second window
If the imbalance threshold is exceeded for an amount of time higher than this hold duration, CRS will look to rebalance the cluster’s active guests
This is important because it might prevent an unnecessary migration during a brief CPU usage spike for instance
The Minimum Imbalance Improvement has a default value of 10%
This allows you to set a target point to see if load balancing is worthwhile
If CRS calculates that moving active guests won’t bring about an improvement by this amount or better, then guests won’t be migrated because it wouldn’t be worthwhile
Now every cluster is unique and if you ever want to change these settings, then you’ll need to experiment with them
But always test this sort of thing in a lab first otherwise it could introduce problems for users
As before, any affinity rules you’ve created will also be taking into account during a migration decision process
Maintenance Mode:
One more really useful feature we get in Proxmox VE 9 is the ability to Arm and Disarm HA
Before this was introduced, I would manually go through each resource and change its HA status to ignored
Then I would carry out my maintenance and finally put those states back to started
Now we can put HA into maintenance mode
To make use of this, navigate to Datacenter | HA
Just above the Status section you will find Arm HA and Disarm HA options
Arm HA speaks for itself, but Disarm HA gives you two choices; Freeze and Ignore
I must admit I struggle to understand the need for two choices, but this is how I see it
Ignore, is basically the one click button for what I was doing before. It puts all resources into an ignored state and HA will not react to any changes that affect them
Freeze still stops HA reacting and keeps resources in their current state, but it imposes restrictions
As an example, if you have two VMs kept together with an affinity rule, Ignore will let you migrate one of them to another node. Freeze on the other will refuse the migration because of the affinity rule
So my own interpretation is, use Freeze for external maintenance that may affect the cluster. For example, a network change might result in a server losing network access for a while
And use Ignore for internal changes within the cluster such as software upgrades for the servers themselves
In any case, once the work is done, click Arm HA to restore monitoring
Either way, this makes maintenance so much easier within the GUI
Sharing is caring!