User initiated manual failover of SQL Managed Instance

High availability is a fundamental part of SQL Managed Instance (MI) platform that works transparently for your database applications.  As such, failovers from primary to secondary nodes in case of a fault or node degradation, or during our regular monthly software patches are an expected occurrence for all applications using SQL Managed Instance as a database server in Azure.

In August 2020 have released a new feature user-initiated manual failover that can help you manually trigger a failover on a managed instance by executing PowerShell and CLI commands, or by making an API call.

This capability will allow you to manually initiate a failover on a managed instance that will be the same as automated failover for high availability and software patches on the service. This will help you test end to end application fault resiliency on automatic failovers in case of planned or unplanned events before deploying to production. In addition to testing how failover impacts existing database sessions, you can also verify if it changes the end-to-end performance due to changes in latency. Because the restart operation is intrusive and a large number of them could stress the platform, only one user-initiated manual failover call is allowed every 30 minutes for each managed instance (Update: Engineering team is looking to reduce this value to 15 minutes due to customer feedback in the next few months).

Ensuring that your applications are failover resilient prior to deploying to production will help mitigate the risk of application faults in production, This will also contribute to application availability for your customers.

How is High Availability (HA) implemented on a Managed Instance?

Managed Instance (MI) is offered in two service tiers, one is Business Critical (BC) and the other one is GP (General Purpose). Both service tiers offer High Availability (HA), with different technical implementations, as follows:

  • HA for Managed Instance BC service tier was built based on AlwaysOn Availability Groups (AG) technology, resulting in such MI consisting of the total of 4 nodes – one primary and three secondary R/O replicas. In case of a failover, one of the secondary replicas becomes primary. This type failover typically takes only a few short seconds.
  • HA for GP service tiers was based on multiple redundancy of the storage layer and it is based on a single primary node. In case of a failover, a new node is taken from the pool of standby nodes, and the is re-attached from the old to the new primary node. This type of failover typically takes under a minute.

Therefore, using the user-initiated manual failover functionality, manually initiating a failover on MI BC service tier will result in a failover of the primary node to one of the three secondary nodes. Manually initiating a failover on MI GP service tier will result in de-allocation of the primary node, and allocation of a new node from the pool of available nodes, and re-attachment of the from the old to the new node.

As secondary read-only nodes on the MI BC service tier can be used for read scale-out from a single node (out of three read-only secondary nodes), the user initiated manual failover capability allows also a failover of read-only . This means that users can manually failover the read scale-out from the current to one of the two other available read-only secondary nodes.

initiate a manual failover on SQL Managed Instance?

Using PowerShell

The minimum version of Az.Sql needs to be v2.9.0 (download link), or please use Azure Cloud Shell from the Azure portal which always has the latest version available.

Use PS command Invoke-AzSqlInstanceFailover with the following example to initiate failover of the primary node, applicable to both BC and GP service tier:

PowerShell
Invoke-AzSqlInstanceFailover -ResourceGroupName “ResourceGroup01” -Name “ManagedInstance01”

Use the following PS command to failover read secondary node, applicable to BC service tier only:

PowerShell
Invoke-AzSqlInstanceFailover -ResourceGroupName “ResourceGroup01” -Name “ManagedInstance01” -ReadableSecondary

Using CLI

Please ensure you have the latest CLI installed.

Use az sql mi failover CLI command with the following example to initiate failover of the primary node, applicable to both BC and GP service tier:

CLI
az sql mi failover -g myresourcegroup -n myinstancename

Use the following CLI command to failover read secondary node, applicable to BC service tier only:

CLI
az sql mi failover -g myresourcegroup -n myinstancename –-type ReadableSecondary

Using Rest API

For advanced users who would perhaps like to automate failovers of their SQL Managed Instances for purposes of implementing continuous testing pipeline, or automated performance mitigators, this can be accomplished through intiating failover through an API call.

To initiate failover using REST API call you will first need to generate Auth Token. One way to do that is to use a Postman client. Initiating the API call from any other client should generally work as well. This token is used as Authorization property in the header of API request and it is mandatory.

The following is an example of the API URI to call:

API URI
https://management.azure.com/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.Sql/managedInstances/{managedInstanceName}/failover?replicaType=Primary&api-version=2019-06-01-preview

The following are API call properties that can be passed in the call:

API property Parameter
subscriptionId Subscription ID to which managed instance is deployed
resourceGroupName Resource group that contains managed instance
managedInstanceName Name of managed instance
replicaType (Optional) (Primary|ReadableSecondary)

This is the type of  to be failed over: primary or readable secondary.
If not specified, failover will be initiated on the primary replica by default.

api-version Static value and currently needs to be “2019-06-01-preview”

API response will be one of the following two:

  1. 202 Accepted
  2. One of the 400 request errors.

Track the operation status

Monitoring the failover

  • Once the qpi.nodes view is created, you can execute the following query to see the current node configuration on an MI:
T-SQL
SELECT replication_endpoint_url, primary_node = avg(cast (is_primary_replica as float)) FROM qpi.nodes GROUP BY replication_endpoint_url

The output will be something as follows:

Image 02.PNG

  • In this example we see primary and three secondary replicas on MI BC node, each assigned to an internal IP address starting with 10.0.0.X.
  • Upon execution of a failover, running this query again would need to indicate a change of the primary node.

Functional limitations

  • Throttle mechanism is implemented to guard from potentially too many failovers. As such, you can initiate one on the same MI every 30 minutes.
  • For BC instances you must have of replicas even after failover. That means failover can be done only if all replicas are healthy.
  • You cannot specify which readable secondary replica you want to initiate failover on. This is because MI automatically allows only for a single read only replica to be available to customers.

Disclaimer

Please note that products and options presented in this article are subject to change. This article reflects the user-initiated manual failover option available for Azure SQL Managed Instance in August, 2020.

Closing remarks

To share this article, you can use this short link: https://aka.ms/mifailover-techblog

 

This article was originally published by Microsoft's ITOps Talk Blog. You can find the original article here.