Create a SQL FCI with a Tertiary AG Replica

Procedure 

The purpose of this guide is to provide the steps needed to deploy a 2-node Windows Server Failover (WSFC) hosting a clustered instance of SQL Server 2019.  A 3rd Windows server running on a separate subnet will also be deployed to host a standalone instance of SQL Server.  An availability group will be created between the clustered instance of SQL Server and the standalone instance of SQL Server spanning two subnets.  For simplicity, this guide uses default instances of SQL Server and default ports for connectivity. 

Create a Windows Server Failover Cluster 

The first step is to create a Windows Server Failover (WSFC).  Think of this as the container for you clustered instance of SQL Server. 

Complete the steps in this document up to and including the section “Create the failover ” https://docs.microsoft.com/en-us/windows-server/failover-clustering/create-failover-cluster

When you're done, you should have something that looks like this on the Nodes page in Failover Cluster Manager (FCM):

JoshuaLent_0-1586198850488.jpeg

Install a Clustered Instance of SQL Server 

Once you have a WSFC, you can now deploy the clustered instance of SQL (SQL FCI) into the WSFC. 

Follow the steps here: https://docs.microsoft.com/en-us/sql/sql-server/failover-clusters/install/sql-server-failover-cluster-installation?view=sql-server-ver15#-failover-cluster-installation-options 

On step 30, you'll finish with Node 2 of the SQL FCI.  Do not perform this step on the 3rd server that will host the AG. 

You should now have something that looks like this under the Roles page in the FCM: 

SKYNET.jpg

Add the Third Node 

The next step is to add the tertiary server. 

If you haven't already, deploy a Windows Server in your DR subnet. 

Install a standalone instance of SQL Server onto Node 3.  A default instance name is easiest to work with. 

Install the Failover Clustering feature.

JoshuaLent_2-1586198850489.jpeg

Next

  1. Verify network connectivity from Node 3 to both Nodes 1 and 2.
  2. Verify network connectivity from Nodes 1 and 2 to Node 3.
  3. Connect to the cluster you built in step 1 using the FCM on Node 3. 

Now, use the Add Node wizard in the Actions pane of the FCM. 

JoshuaLent_3-1586198850490.png

Validate node and perform all tests.

JoshuaLent_4-1586198850490.jpeg

Upon completion of the wizard, your node 3 should now be part of the cluster.

JoshuaLent_5-1586198850491.jpeg

Configure Always On High Availability

Before continuing, ensure the following TCP ports are open (in both directions) between the clustered instance of SQL Server and the standalone instance of SQL Server

TCP 1433, for the default instance or the port(s) in use for named instances. 

TCP 5022, for the default Availability Group endpoint port. 

TCP 1433, or other custom port for the Listener port. 

UDP 1434, for interaction with the SQL Browser service 

Now it's time to configure Always On High Availability. 

From the clustered instance of SQL Server, open the SQL Server Configuration Manager and right-click > properties of the SQL Server service:

JoshuaLent_6-1586198850515.jpeg

Next, click on the Always On Availability Groups tab.  Confirm the WSFC value is correct and check the Enable Always On Availability Groups check box.  This requires SQL Server to recycle.

JoshuaLent_7-1586198850516.jpeg

Recycle the clustered instance of SQL Server using the FCM using stop/start role as shown:

JoshuaLent_8-1586198850492.jpeg

Follow steps a and b above on the DR node using the SQL Server Configuration Manager to recycle the SQL Server service this time. 

Create the Availability Group 

Before you begin, run the following commands on both instances to enable permissions for the systems to manage availability groups:

grant alter any availability group to [NT AuthoritySYSTEM]
grant connect sql to [NT AuthoritySYSTEM]
grant view server state to [NT AuthoritySYSTEM] 

Starting with the clustered instance of SQL Server, create an empty database – the name is arbitrary. 

Ensure the database is in full-recovery mode. 

Create a full backup of the database. 

Create a transaction log backup of the database that is appended to the full backup. 

Copy your backup file to the DR node and restore the backup With No Recovery leaving the database in a restoring state.

JoshuaLent_9-1586198850493.jpeg
JoshuaLent_11-1586198850494.jpeg

Return to the clustered instance of SQL Server and start the new Availability Group Wizard. 

AG1.jpg

Provide a meaningful name for your availability group.

JoshuaLent_13-1586198850495.jpeg

Select the database.  Meets prerequisites should be displayed under the Status field.

JoshuaLent_14-1586198850496.jpeg

Click the Add  button and add in your DR node.  Leave the Availability Mode as Asynchronous commit.  Whether you use a Readable Secondary is up to you.  Click Next.  We'll create the Listener in a later step.

AG4.jpg

Select Join Only for your data synchronization option and click Next.

JoshuaLent_17-1586198850525.jpeg

Review the validation tests and resolve any errors.  You can ignore the listener warning.

JoshuaLent_18-1586198850516.jpeg

Review the summary screen, script out if you like, and click Finish.

AG-Summary.jpeg

The wizard should complete successfully:

AG11.jpg

Connect your object explorer in SSMS to both instances.  Review the availability group and the availability database.  Your clustered instance of SQL Server should be the Primary and the DR node should be Secondary.

OE.jpg

You can also launch the availability group dashboard to review the health of your AG:

JoshuaLent_22-1586198850500.jpeg

From the dashboard, everything should be green and healthy:

Healthy.jpg

Create the Listener

With the availability group created and healthy, it's now time to create the listener. 

Most likely it will be necessary to stage the listener objects in and DNS.  Perform the following tasks:

In , create a computer object with the name of the listener you would like.  Grant Full Control permissions to the computer object for the Cluster computer object (SQL19CL in this example) and the standalone SQL Server (SQL19N3 in this example).  The computer object should be created in the same container as the rest of the computer objects created during this exercise:

JoshuaLent_24-1586198850526.jpeg

Next, you'll need to stage the Host (A) records in .  Create a Host (A) record for the listener in each subnet, one for the clustered instance of SQL Server, and one for the standalone instance of SQL Server.  Grant full control to the entries for the failover cluster and standalone SQL Server similar to what you did in the previous step:

dns.jpg

Back on the primary node of the clustered instance of SQL Server, run the following to refresh your resolver cache: 
ipconfig /flushdns 

Use ping , to ensure the host entries resolve properly.

Perform the task on the standalone SQL Server.  

From the primary in object explorer, right-click on the Availability Group Listeners folder and select Add Listener…

L1.jpg

Enter the DNS name for your listener.  Choose a port to access the listener on.  Then change the Network Mode from DHCP to Static IP.  Click the Add… button and then add the two IP addresses, 1 for the network your clustered instance of SQL Server is on and one for the network your standalone instance of SQL Server is on.  These are the IP addresses you configured in step 1b.

JoshuaLent_27-1586198850522.jpeg

When you're finished, click OK to create the listener

JoshuaLent_28-1586198850502.jpeg

Your listener should now be present in object explorer under the Availability Group Listeners folder.

JoshuaLent_29-1586198850503.jpeg

You should also be able to connect to the listener from object explorer.  Running select @@servername should return the name of the clustered instance of SQL Server. 

connect.jpg

FCM Review 

At this point, you've verified everything from the SQL Server perspective.  Navigate back to the Failover Cluster Manager and let's see what's happened. 

From the FCM, navigate to your cluster > Roles.  You should see your clustered instance of SQL Server as 1 role.  You should also see your availability group as a second role.  Select the availability group role and notice the information in the summary tab.  You'll see the current owner, the Client Access Name which is your listener and the IP addresses for it:

JoshuaLent_31-1586198850504.jpeg

Performing a Failover

The configuration does not allow for automatic failover.  You are able to configure either Synchronous or Asynchronous Availability Modes.  In general, Asynchronous commit mode should be suitable for your DR node; however, depending on network latency and other needs such as reporting, you may want to enable Synchronous commit mode. Understand the implications prior to deciding to run with either commit mode. 

While in Asynchronous commit mode, only forced failovers are possible.  This can make testing difficult due to the need to re-seed the availability group to return to ‘normal'.  It this situation, and again if your network latency permits, switch the commit mode to Synchronous mode prior to a failover test as shown below.

Connect to the primary using object explorer.  Navigate to the availability group and right-click > properties.

JoshuaLent_32-1586198850505.jpeg

Change the Availability Mode to Synchronous commit and click OK.

synch.jpg

Notice that in the availability group dashboard the synchronization state should have changed from Synchronizing to Synchronized.  Depending on how far behind the secondary is, this might take a moment. 

synced.jpg

Once you've transitioned into the Synchronizing state from the Secondary, launch the failover wizard:

JoshuaLent_35-1586198850506.jpeg

Review the messages and confirm No Data Loss under the Failover Readiness field.  Click Next. 

fo2.jpg

Review the Summary and click Finish.  Confirm Success. Click Close.

JoshuaLent_37-1586198850508.jpeg

From object explorer, run your select @@servername statement again using your listener connection and notice that you're now connected to the DR node.

fo4.jpg

Repeat steps 4 and 5 to return the availability group back to the clustered instance of SQL Server.

Once you've failed back, change the commit mode back to Asynchronous, if desired/necessary. 

At this point, it's important to note that failovers of the Availability Group should only be performed via the SSMS Object Explorer.  Failovers of the clustered instance of SQL Server should only be performed using the Failover Cluster Manager.  A failover of the clustered instance of SQL Server simply changes ownership of the clustered instance of SQL Server from one node to the other and does not change the primary/secondary configuration of the availability group. 

Addendums

Some customer environments may make it necessary to adjust the NodeWeight of the DR Node (eliminate voting rights in the WSFC).  This may be necessary if you encounter situations where the FCI on Nodes 1 and 2 are unexpectedly failing over. 

Adjust Quorum Vote

Here we're adjusting the vote for the DR node because we do not want it to be able to cause a failover of the clustered instance of SQL Server.

Next, change the quorum node weight for Node 3.

From an administrative PowerShell prompt run the following commands:

import-module failoverclusters
$nodes = get-clusternode -cluster 
$nodes | ft -property nodename, state, nodeweight 

You should get some output that looks like this.  Notice that your DR node has a weight of 1.  We want to change this to 0 to remove it's ability to failover the cluster.

JoshuaLent_39-1586198850508.png

Continue by running the following commands:

$node = “”
(Get-ClusterNode $node).NodeWeight = 0
$cluster = (Get-ClusterNode $node).Cluster
$nodes = Get-ClusterNode -Cluster $cluster
$nodes | ft -property NodeName, State, NodeWeight 

You should now see that Node 3 has a weight of 0 as shown below:

JoshuaLent_40-1586198850509.png

This will also be reflected in the FCM as shown here:

JoshuaLent_41-1586198850509.jpeg

Multi-subnet Failover 

In the event of needing to failover your availability group you'll want to make some configuration changes to support the infrastructure you've just set up.  In an Always On environment where one or more secondary replicas are on a separate subnet, a DNS (A) record will exist for the listener on each subnet.  For the cluster to initialize the listener on the secondary subnet after a failover both the cluster and client application need to be configured correctly to utilize the multi-subnet.

Verify Current Configuration 

Login to one of the nodes in your cluster and open PowerShell as an administrator and run the following commands one at a time:

Import-Module FailoverClusters
Get-ClusterResource | Select Name, ResourceType 

Identify the listener name as shown below:

psl1.jpg

Next, get the HostRecordTTL value.  This will show how long the listener name will be cached in DNS, in seconds.

Get-ClusterResource “YourListenerNameAbove” | Get-ClusterParameter HostRecordTTL 
JoshuaLent_43-1586198850511.png
Change the Configuration 

Depending on the results of steps 1 and 2 above, you may need to change either value.  If you're persisting both DNS records and client connections strings are using “MultiSubnetFailover=True” then you'll need to set the value of RegisterAllProvidersIP to 1.  Conversely, if you have clients connecting using ADO.NET with .NET 3.5SP1 or SNAC11, you may need to set the RegisterAllProvidersIP value to 0.  See references below for more information.  In some cases, if your TTL is too high, clients won't refresh their DNS cache soon enough and may have connection problems after a failover.

Change the RegisterAllProvidersIP value to 1:

Get-ClusterResource “YourListenerName” | Set-ClusterParameter RegisterAllProvidersIP 1

Change the TTL value to 300 (or lower/higher depending on your needs):
Get-ClusterResource “YourListenerName” | Set-ClusterParameter HostRecordTTL 300

Confirm your changes:
Get-ClusterResource “YourListenerName” | Get-ClusterParameter HostRecordTTL, RegisterAllProvidersIP 
JoshuaLent_44-1586198850511.png

References 

 

This article was originally published by Microsoft's ITOps Talk Blog. You can find the original article here.