Azure SQL MI Failover Group Pre-Requisites

In this blogpost, you will understand about the pre-requisites to create a failover group successfully in Azure SQL Managed Instance.

If you are designing a Disaster Recovery (DR) strategy for Azure SQL Managed Instance (SQL MI), you have likely settled on Auto-Failover Groups as your solution. It’s the gold standard: it gives you a read-write listener endpoint (so you don’t have to change connection strings) and handles the heavy lifting of replicating your data to a secondary region.

Table of Contents

But here is the reality check: You cannot just click “Create Failover Group” and hope for the best.

Unlike standard Azure SQL Databases (Singletons), SQL Managed Instance is effectively a virtual cluster injected into your Virtual Network (VNet). This means the networking prerequisites are significantly stricter. If you miss one port or overlapping IP, the deployment will fail, usually after you’ve waited 45 minutes for it to try.

As a content writer for Cloud Nerchuko, I’ve dug through the documentation to bring you the definitive checklist. Here are the mandatory prerequisites you must clear before setting up a Failover Group.

1. The Golden Rule of Networking: No Overlapping IPs

This is the number one reason failover group deployments fail.

Because SQL MI sits inside a VNet (Virtual Network), the Primary and Secondary instances must be able to “talk” to each other via their private IP addresses.

  • The Requirement: The IP address range of the Primary Subnet and the Secondary Subnet must strictly NOT overlap.
  • The Scenario: If your Primary VNet is 10.0.0.0/16 and your Secondary VNet is also 10.0.0.0/16, you are dead in the water. You cannot peer them, and therefore, you cannot create a failover group.
  • The Fix: Ensure your DR region VNet uses a completely different address space (e.g., 10.1.0.0/16).

2. The Connection: Global VNet Peering

Since your instances are likely in different Azure regions (e.g., East US and West US), they aren’t on the same physical network. You need a bridge.

  • The Requirement: You must establish Global VNet Peering between the Primary VNet and the Secondary VNet.
  • Why not VPN? While you can use an Azure VPN Gateway to connect them, Global VNet Peering is Microsoft’s recommendation because it uses the Azure backbone, offering lower latency and higher bandwidth for data replication.
  • Choosing paired vs non-paired regions: Choose paired regions in Azure for low latency than non-paired regions which has high latency due to the distances between the regions.

3. The “Bouncer”: Network Security Groups (NSG) & Ports

This is where things get technical. Your Network Security Group (NSG) acts as a firewall for the subnet. By default, it blocks most unsolicited traffic. You must explicitly open specific ports to allow the two instances to synchronize data.

You need to create Inbound and Outbound rules on both the Primary and Secondary NSGs for the following ports:

  • Port 5022: This is the standard port for data replication traffic.
  • Ports 11000-11999: This range is critical for the “Redirect” connection policy, which SQL MI uses for performance. If you block this, replication may struggle or fail.

The Rule Logic:

  • Primary NSG: Allow Inbound from Secondary Subnet IP Range on 5022 + 11000-11999.
  • Secondary NSG: Allow Inbound from Primary Subnet IP Range on 5022 + 11000-11999.

Note: If you are using a Hub-and-Spoke topology with a firewall appliance (like Azure Firewall) in the middle, you must ensure routes are configured so that ports 5022 and 11000-11999 are not blocked or subjected to deep packet inspection that breaks the SQL traffic flow.

4. Instance Compatibility (The “Mirror” Concept)

You cannot pair a Ferrari with a minivan. To ensure that your secondary region can handle the workload if a disaster strikes, the instances must match.

  • Service Tier Match: You cannot pair a General-Purpose instance with a Business-Critical instance. They must be the same tier.
  • Hardware Generation: While not strictly blocked, it is highly recommended to use the same hardware generation (e.g., Standard Series Gen5) on both sides to guarantee performance.
  • Storage Space: The secondary instance must have enough free storage to accept the databases from the primary and the size should be the same.
  • Update Policy: The update policy for both the instances should be same. (ex: SQL Server 2022). Refer update policy.
  • Secondary database: Don’t create any user database in the secondary region. If you have any database created in the secondary region SQL MI, try to delete the database to configure the failover group.

5. DNS Resolution (For Custom DNS Users)

If your VNets are using default Azure DNS, you can skip this. But if you are using Custom DNS Servers, you have an extra step.

  • The Requirement: The Primary instance must be able to resolve the FQDN (Fully Qualified Domain Name) of the Secondary instance, and vice-versa.
  • The Fix: You may need to add DNS forwarders or specific records to your custom DNS servers to ensure the two instances can “find” each other by name, not just by IP.

6. The Seeding Reality

Once you click “Create,” the process initiates Initial Seeding.

  • What happens: Azure takes a physical copy of your database and restores it on the secondary.
  • The Warning: If you have 10TB of data, this will not happen instantly. It depends on your available bandwidth. During this time, the failover group is being created, but it is not yet “protected.”
  • Important: Do not manually restore the database on the secondary server beforehand. The Failover Group logic expects to push the database itself. If a database with the same name already exists on the secondary, the setup will fail.

Summary Checklist

Before you start your deployment, verify these five items:

  1. IP Ranges: Do my Primary and Secondary subnets overlap? (Answer must be No)
  2. Connectivity: Is Global VNet Peering status “Connected”?
  3. Firewall: Are ports 5022 and 11000-11999 open in both directions?
  4. Tiers: Are both instances on the same Service Tier (e.g., both General Purpose)?
  5. Clean Slate: Is the database name absent from the secondary instance?

References & Further Reading

Disclaimer: This content is AI-written and then human-refined for accuracy and functionality.

Leave a Comment