Hyper-V Failover Cluster Best Practices (footnotes)
Some of you may be familiar with this handy best practices checklist, published by Microsoft. Actually it comes in three varieties currently (I haven’t seen an updated one for 2016 yet):
It’s a pretty good resource. In my many years deploying and supporting Hyper-V (especially failover clusters), I’ve learned that there are just some things that should absolutely be done as part of every implementation. Granted, the scope of this article applies mostly to SMB / branch office deployments, where you have a fairly simple two node cluster with shared storage (e.g. SAS or iSCSI).
I recommend placing Hyper-V computer objects into a separate OU and blocking inheritance on it. If you have policies for example that restrict which accounts are granted the logon as a service right, or policies that distribute software or apply other special settings, it is very likely you could have adverse affects in your fail-over cluster. The easiest way to avoid these types of issues is to block inheritance and apply only the policies that you explicitly want to the Hyper-V systems.
In the olden days, hotfixes were only applied if someone from Microsoft support explicitly recommended it to you based on an issue that you were experiencing. These days, it is recommended to apply certain hotfixes proactively. Note: You are supposed to read through the recommended list and apply the ones that pertain to your situation. For example, there are a number of hotfixes that refer to shared VHDX files, and if you don’t use them, then you don’t have to apply the corresponding updates.
Some hotfixes are eventually included in update rollups, so you might have some applied already. But, it is worth your time to attempt installing the applicable ones regardless–if you already have one of them, you’ll just be prompted with a message like, “This update is not applicable” or similar. Additionally, I have found this resource to be very helpful in “narrowing down” the list: Thank you Hans @ Hyper-V.nu for your helpful articles & scripts.
You can also configure Cluster Aware Updating to apply hotfixes, but this is still somewhat a manual process, as you have to point it to a network location containing the hotfixes.
VMQ – disable it!
Do yourself this favor: just take a minute to disable VMQ on any adapter that is merely a 1 Gbps link. There is no need to use VMQ unless you are pushing a lot of traffic through something like a 10 Gbps card. And frankly, leaving it enabled can put you at risk. Granted, this is mostly with regard to specific Intel/Broadcom NIC’s & driver issues that are often resolved through an update, but it’s just as easy to keep this disabled–the SMB doesn’t typically need it anyway.
Get-NetAdapterVMQ | Disable-NetAdapterVMQ
In 2008 R2, you were required to isolate your network adapters to specific functions (e.g. CSV, Live Migration, Management all map directly to a physical NIC). In 2012 and 2012 R2, we can optionally use NIC teaming and QoS. But this is often applied too liberally in the SMB. Therefore:
- For most SMB deployments, I recommend choosing Switch Independent / Hyper-V Port as the load-balancing algorithm. This is optimized for Hyper-V workloads, and I have seen issues with certain types of virtualized applications (e.g. phone servers) when using a different choice such as Address Hash.
- Do NOT use teaming w/ iSCSI–use MPIO instead, per the best practices checklist I cited above.
- Unless you’re teaming 10 Gbps cards and applying QoS policies to the individual virtual network adapters using PowerShell, it’s best if you just keep your Cluster & Live Migration traffic isolated. It’s the easiest way to guarantee you won’t accidentally choke other workloads on weaker 1 Gbps links. Accomplish this with separate VLAN’s on your physical switches, or in a simple 2-node cluster, with cross-over cables strung between the hosts.
Virtual Domain Controllers
Since 2012, you can virtualize your domain controllers on a Hyper-V cluster. One of the key features in 2012 that allows this is “cluster bootstrapping.” This means that the cluster can start itself even if the domain is not available right away (cluster services have a strong dependency on domain services). In 2008 R2, you could have serious issues if you made all your DC’s virtual (this wasn’t technically supported anyway). The first time you experience a power outage, and both nodes go down at the same time, you’ll find it very difficult to get things back up and running again.
Unfortunately, I still have seen situations where virtual DC’s going offline can become an issue in 2012, even with cluster bootstrapping. There are other things at play in these situations, but nevertheless it is a possible risk that is easy to avoid.
Therefore, you should place each virtual domain controller on local storage (not shared storage)–one for each node in a two-node cluster. These can be started independently, outside of the cluster. Be sure you set the automatic start options on your domain controllers, as well: Always start the virtual machine automatically, and make sure they come online before other services in the environment, for obvious reasons.
Domain controllers are poor choices for time authority on the network since they can be affected by the Hyper-V integration services when they are virtualized. Therefore, be very deliberate about your time synchronization settings. I have seen terrible issues caused by this all of the time. If you are leaving the default integration settings in place, then make sure your host server hardware clocks are set to synchronize with a reliable time authority such as time.nsit.gov or those available from pool.ntp.org.
Personally, I like setting up my WatchGuard firewall to be the NTP server on the network, and then allowing it to sync with an outside authority. The reason is, my firewall is usually online at the same IP address at all times, and will reply to requests internally even if the WAN connection is down on the outside. This feature may not be available on all firewalls.
Don’t over-provision your storage (at the SAN level or at the hyper-visor level). By default, VHDX disks are provisioned as dynamic, but this isn’t even supported for certain workloads, such as Exchange. Granted, most SMB’s won’t have a lot of complex workloads like Exchange anymore (they would more likely use Office 365). But, they also don’t have an admin who is watching the disk growth and managing it proactively.
In most cases, I don’t think SMB organizations see that much benefit from deploying dynamic disks–certainly not as much benefit as the accompanying risks on the other side of the equation. A vast majority of them will have less than 1 TB of shared data on their file server, for example. Just use fixed disks for the ease of management and long-term stability–that’s my advice.
This concludes my own footnotes to the best practices. Remember that they are not exhaustive, they are merely field notes from my own experience, and are meant to supplement the earlier mentioned references.