My name is John Marlin and I am with the High Availability and Storage Team here and today I want to talk about Failover Clustering and Networking. Networking is a fundamental key with Failover Clustering that sometimes is overlooked but can be the difference in success or failure. In this blog, I will be hitting on all facets from the basics, tweaks, multi-site/stretch, and Storage Spaces Direct.
In Failover Clustering, all networking aspects are provided by our Network Fault Tolerant (NetFT) adapter. Our NetFT adapter is a virtual adapter that is created with the Cluster is created. There is no configuration necessary as it is self-configuring. When it is created, it will create its MAC Address based off of a hash of the MAC Address of the first physical network card. It does have conflict detection and resolution built in. For the IP Address scheme, it will create itself an APIPA IPv4 (169.254.*) and IPv6 (fe80::*) address for communication.
Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Microsoft Failover Cluster Virtual Adapter
Physical Address. . . . . . . . . : 02-B8-FA-7F-A5-F3
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::80ac:e638:2e8d:9c09%4(Preferred)
IPv4 Address. . . . . . . . . . . : 169.254.1.143(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.0.0
Default Gateway . . . . . . . . . :
DHCPv6 IAID . . . . . . . . . . . : 67287290
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-26-6B-52-A5-00-15-5D-31-8E-86
NetBIOS over Tcpip. . . . . . . . : Enabled
The NetFT adapter provides the communications between all nodes in the cluster from the Cluster Service. To do this, it discovers multiple communication paths between nodes and if the routes are on the same subnet or cross subnet. The way it does this is through “heartbeats” through all network adapters for Cluster use to all other nodes. Heartbeats basically serve multiple purposes.
- Is this a viable route between the nodes?
- Is this route currently up?
- Is the node being connected to up?
There is more to heartbeats, but will defer to my other blog No Such Thing as a Heartbeat Network for more details on it.
For Cluster communication and heartbeats, there are several considerations that must be taken into account.
- Traffic uses port 3343. Ensure any firewall rules have this port open for both TCP and UDP
- Most Cluster traffic is lightweight.
- Communication is sensitive to latency and packet loss. Latency delays could mean performance issues, including removal of nodes from membership.
- Bandwidth is not as important as quality of service.
Cluster communication between nodes is crucial so that all nodes are currently in sync. Cluster communication is constantly going on as things progress. The NetFT adapter will dynamically switch intra-cluster traffic to another available Cluster network if it goes down or isn’t responding.
The communications from the Cluster Service to other nodes through the NetFT adapter looks like this.
- Cluster Service establishes TCP connection over NetFT adapter using the private NetFT IP address (source port 3343)
- NetFT wraps the TCP connection inside of a UDP packet (source port 3343)
- NetFT sends this UDP packet over one of the cluster-enabled physical NIC adapters to the destination node targeted for destination node’s NetFT adapter
- Destination node’s NetFT adapter receives the UDP packet and then sends the TCP connection to the destination node’s Cluster Service
Heartbeats are always traversing all Cluster enabled adapters and networks. However, Cluster communication will only go through one network at a time. The network it will use is determined by the role of the network and the priority (metric).
There are three roles a Cluster has for networks.
Disabled for Cluster Communications – Role 0 – This is a network that Cluster will not use for anything.
Enabled for Cluster Communication only – Role 1 – Internal Cluster Communication and Cluster Shared Volume traffic (more later) are using this type network as a priority.
Enabled for client and cluster communication – Role 3 – This network is used for all client access and Cluster communications. Items like talking to a domain controller, DNS, DHCP (if enabled) when Network Names and IP Addresses come online. Cluster communication and Cluster Shared Volume traffic could use this network if all Role 1 networks are down.
Based on the roles, the NetFT adapter will create metrics for priority. The metric Failover Cluster uses is not the same as the network card metrics that TCP/IP assigns. Networks are given a “cost” (Metric) to define priority. A lower metric value means a higher priority while a higher metric value means a lower priority.
These metrics are automatically configured based on Cluster network role setting.
Cluster Network Role of 1 = 40,000 starting value
Cluster Network Role of 3 = 80,000 starting value
Things such as Link speed, RDMA, and RSS capabilities will reduce metric value. For example, let’s say I have two networks in my Cluster with one being selected and Cluster communications only and one for both Cluster/Client. I can run the following to see the metrics.
PS > Get-ClusterNetwork | ft Name, Metric
Cluster Network 1 70240
Cluster Network 2 30240
The NetFT adapter is also capable of taking advantage of SMB Multichannel and load balance across the networks. For NetFT to take advantage of it, the metrics need to be < 16 metric values apart. In the example above, SMB Multichannel would not be used. But if there were additional cards in the machines and it looked like this:
PS > Get-ClusterNetwork | ft Name, Metric
Cluster Network 1 70240
Cluster Network 2 30240
Cluster Network 3 30241
Cluster Network 4 30245
Cluster Network 5 30265
In a configuration such as this, SMB Multichannel would be used over Cluster Networks 2, 3 and 4. From a Cluster communication and heartbeat standpoint, multichannel really isn’t a big deal. However, when a Cluster is using Cluster Shared Volumes or is a Storage Spaces Direct Cluster, storage traffic is going to need higher bandwidth. SMB Multichannel would fit nicely here so an additional network card or higher speed network cards are certainly a consideration.
In the beginning of the blog, I mentioned latency and packet loss. If heartbeats cannot get through in a timely fashion, node removals can happen. Heartbeats can be tuned in the case of higher latency networks. The following are default settings for tuning the Cluster networks.
Windows 2012 R2
For more information on these settings, please refer to the Tuning Failover Cluster Network Thresholds blog.
Planning networks for Failover Clustering is dependent on how it will be used. Let’s take a look at some of the common network traffics a Cluster would have.
If this were a Hyper-V Cluster running virtual machines and Cluster Shared Volumes, Live Migration is going to occur. Clients are also connecting to the virtual machines.
Cluster Communications and heart beating will always be on the wire. If you are using Cluster Shared Volumes (CSV), there will be some redirection traffic.
If this were Cluster that used ISCSI for its storage, you would have that as a network.
If this was stretched (nodes in multiple sites), you may have the need for an additional network as the considerations for replication (such as Storage Replica) traffic.
If this is a Storage Spaces Direct Cluster, additional traffic for the Storage Bus Layer (SBL) traffic needs to be considered.
As you can see, there is a lot of various network traffic requirements depending on the type of Cluster and the roles running. Obviously, you cannot have a dedicated network or network card for each as that just isn’t always possible.
We do have a blog that will help with the Live Migration traffic to get some of the traffic isolated or limited in the bandwidth it uses. The blog Optimizing Hyper-V Live Migrations on an Hyperconverged Infrastructure goes over some tips to set up.
The last thing I wanted to talk about is with stretch/multisite Failover Clusters. I have already mentioned the Cluster specific networking considerations, but now I want to talk about how the virtual machines react in this type environment.
Let’s say we have two datacenters and a four-node Failover Cluster with 2 nodes in each datacenter. As with most datacenters, they are in their own subnet. So it would be similar to this:
The first thing you want to consider is if you want security between the cluster nodes on the wire. As a default, all Cluster communication is signed. That may be fine for some, but for others, they wish to have that extra level of security. We can set the Cluster to encrypt all traffic between the nodes. It is simply a PowerShell command to change it. Once you change it, the Cluster as a whole needs to be restarted.
(Get-Cluster).SecurityLevel = 2
0 = Clear Text
1 = Signed (default)
2 = Encrypt (slight performance decrease)
Here is a virtual machine (VM1) that has an IP Address on the 184.108.40.206/8 network and clients are connecting to it. If the virtual machine moves over to Site2 that is a different network (220.127.116.11/16), there will not be any connectivity as it stands.
To get around this, there are basically a couple options.
To prevent the virtual machine from moving from a Cluster-initiated move (i.e. drain, node shutdown, etc), consider using sites. When you create sites, Cluster now has site awareness. This means that any Cluster-initiated move will always keep resources in the same site. Setting a preferred site will also keep it in the same site. If the virtual machine was to ever move to the second site, it would be due to a user-initiated move (i.e. Move-ClusterGroup, etc) or a site failure.
But you still have the IP Address of the virtual machine issue to deal with. During a migration of the virtual machine, one of the very last things is to register the name and IP Address with DNS. If you are using a static IP Address for the virtual machine, a script would need to be manually run to change the IP Address to the local site it is on. If you are using DHCP, with DHCP servers in each site, the virtual machine will obtain a new address for the local site and register it. You then have to deal with DNS replication and TTL records a client may have. Instead of waiting for the timeout periods, a forced replication and TTL clearing on the client side would allow them to connect again.
If you do not wish to go that route, a virtual LAN (VLAN) could be set up across the routers/switches to be a single IP Address scheme. Doing this will not have the need to change the IP Address of the virtual machine as it will always remain the same. However, stretching a VLAN is not always easy to do and the Networking Group within your company may not want to do this for various reasons.
Another consideration is implementing a network device on the network that has a third IP Address that clients connect to and it holds that actual IP Address of the virtual machine so it will route clients appropriately.
For our example, we have a network device that has the IP Address of the virtual machine as 18.104.22.168. It will register this with all DNS and will keep the same IP Address no matter which site it is on. Your Networking Group would need to involved with this and need to control it. The chances of them not doing it is something to also consider if it can even done within your network.
We talked about virtual machines, but what about other resources, say, a file server? Unlike virtual machine roles, roles such as a file server have a Network Name and IP Address resource in the Cluster. In Windows 2008 Failover Cluster, we added he concept of “or” dependencies. Meaning, we can depend on this or that.
In the case of the scenario above, your Network Name could be dependent on 22.214.171.124 “or” 126.96.36.199. As long as one of the IP Address resources is online, the name is online and what is published in DNS. To go a step further for the stretch scenario, we have two parameters that can be used.
RegisterAllProvidersIP: (default = 0 for FALSE)
- Determines if all IP Addresses for a Network Name will be registered by DNS
- TRUE (1): IP Addresses can be online or offline and will still be registered
- Ensure application is set to try all IP Addresses, so clients can connect quicker
- Not supported by all applications, check with application vendor
- Supported by SQL Server starting with SQL Server 2012
HostRecordTTL: (default = 1200 seconds)
- Controls time the DNS record lives on client for a cluster network name
- Shorter TTL: DNS records for clients updated sooner
- Disclaimer: This does not speed up DNS replication
By manipulating these parameters, you will have quicker connection times by a client. For example, I want to enable to register all the IP Addresses with DNS but I want the TTL to be 5 minutes. I would run the commands:
Get-ClusterResource FSNetworkName | Set-ClusterParameter RegisterAllProvidersIP 1
Get-ClusterResource FSNetworkName | Set-ClusterParameter HostRcordTTL 300
When setting the parameters, recycling (offline/online) of the resources is needed.
There is more I could go into here with this subject but need to signoff for now. I hope that this gives you some basics to consider when designing your Clusters while thinking of the networking aspects of it. Networking designs and considerations must be carefully thought out.
Happy Clustering !!
Senior Program Manager
High Availability and Storage
Follow me on Twitter: @johnmarlin_msft