Check Replication status between SQL MI Primary and Secondary

Ensuring that SQL Managed Instance (SQL MI) replication is properly synchronized between the primary and secondary instances is critical for maintaining high availability, disaster recovery, and data integrity. Replication allows your databases to stay consistent across environments, but if synchronization fails, it can lead to downtime or data loss.

The actual article where Microsoft published is available at How-to test failover group connectivity between primary and secondary SQL Managed Instances.

The earlier steps in this article were outdated and could throw errors during execution. I’ve made the necessary changes to the commands, and you can now run the updated instructions below without issues.

1. Steps to check replication from Primary to Secondary SQL MI

1. Connect to the secondary instance of SQL Managed Instance using SSMS or any other way (ex: DevOps pipeline or Azure data studio or vscode)

2. Copy the following script and run it on the secondary instance. You will get 3 output declare parameters which includes node, port and server. Copy the values to paste in the next step. These are secondary endpoint details and we will check connectivity against secondary from primary instance.

-- Retrieves the DnsRecordName, constructs the Fully Qualified Domain Name (FQDN) 
-- for the node, and retrieves the HadrPort configuration.

SELECT 'DECLARE @serverName NVARCHAR(512) = N''' + value + ''''
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'DnsRecordName'

UNION

SELECT 'DECLARE @node NVARCHAR(512) = N''' + NodeName + '.' + Cluster + ''''
FROM
(
    SELECT 
        SUBSTRING(replica_address, 0, CHARINDEX('\', replica_address)) as NodeName, 
        RIGHT(service_name, CHARINDEX('/', REVERSE(service_name))-1) as AppName, 
        JoinCol = 1
    FROM sys.dm_hadr_fabric_partitions fp
    JOIN sys.dm_hadr_fabric_replicas fr ON fp.partition_id = fr.partition_id
    JOIN sys.dm_hadr_fabric_nodes fn ON fr.node_name = fn.node_name
    WHERE service_name LIKE '%ManagedServer%' AND replica_role = 2
) t1
LEFT JOIN
(
    SELECT value as Cluster, JoinCol = 1
    FROM sys.dm_hadr_fabric_config_parameters
    WHERE parameter_name = 'ClusterName'
) t2 ON (t1.JoinCol = t2.JoinCol)
INNER JOIN
(
    SELECT [value] AS AppName
    FROM sys.dm_hadr_fabric_config_parameters
    WHERE section_name = 'SQL' AND parameter_name = 'InstanceName'
) t3 ON (t1.AppName = t3.AppName)

UNION

SELECT 'DECLARE @port NVARCHAR(512) = N''' + value + ''''
FROM sys.dm_hadr_fabric_config_parameters
WHERE parameter_name = 'HadrPort';

3. Now, connect to the Primary SQL Managed Instance and in the below script, paste the values that you copied from the above step in the below corresponding parameters.

DECLARE @node NVARCHAR(512) = N”
DECLARE @port NVARCHAR(512) = N”
DECLARE @serverName NVARCHAR(512) = N”

-- Parameters section
DECLARE @node NVARCHAR(512) = N''
DECLARE @port NVARCHAR(512) = N''
DECLARE @serverName NVARCHAR(512) = N''

-- Script section
IF EXISTS (SELECT job_id FROM msdb.dbo.sysjobs_view WHERE name = N'TestFoGConnection')
    EXEC msdb.dbo.sp_delete_job @job_name = N'TestFoGConnection', @delete_unused_schedule=1

DECLARE @jobId BINARY(16), @cmd NVARCHAR(MAX)

EXEC msdb.dbo.sp_add_job @job_name=N'TestFoGConnection', @enabled=1, @job_id = @jobId OUTPUT

-- Step 1: Test port 5022
SET @cmd = (N'tnc ' + @serverName + N' -port 5022 | select ComputerName, RemoteAddress, TcpTestSucceeded | Format-List')
EXEC msdb.dbo.sp_add_jobstep @job_id = @jobId, @step_name = N'Test Port 5022', 
    @step_id = 1, @cmdexec_success_code = 0, @on_success_action = 3, @on_fail_action = 3, 
    @subsystem = N'PowerShell', @command = @cmd, @database_name = N'master'

-- Step 2: Test HADR Port
SET @cmd = (N'tnc ' + @node + N' -port ' + @port + N' | select ComputerName, RemoteAddress, TcpTestSucceeded | Format-List')
EXEC msdb.dbo.sp_add_jobstep @job_id = @jobId, @step_name = N'Test HADR Port', 
    @step_id = 2, @cmdexec_success_code = 0, @subsystem = N'PowerShell', @command = @cmd, @database_name = N'master'

EXEC msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(Local)'
EXEC msdb.dbo.sp_start_job @job_name = N'TestFoGConnection'

-- Check status every 5 seconds
DECLARE @RunStatus INT
SET @RunStatus = 10
WHILE (@RunStatus >= 4)
BEGIN
    SELECT DISTINCT @RunStatus = run_status
    FROM [msdb].[dbo].[sysjobhistory] JH 
    JOIN [msdb].[dbo].[sysjobs] J ON JH.job_id = J.job_id
    WHERE J.name = N'TestFoGConnection' AND step_id = 0
    WAITFOR DELAY '00:00:05';
END

-- Get Logs once job completes
SELECT [step_name],
    SUBSTRING([message], CHARINDEX('TcpTestSucceeded', [message]), CHARINDEX('Process Exit', [message])-CHARINDEX('TcpTestSucceeded', [message])) as TcpTestResult,
    SUBSTRING([message], CHARINDEX('RemoteAddress', [message]), CHARINDEX('TcpTestSucceeded', [message])-CHARINDEX('RemoteAddress', [message])) as RemoteAddressResult,
    [run_status], [run_duration], [message]
FROM [msdb].[dbo].[sysjobhistory] JH 
JOIN [msdb].[dbo].[sysjobs] J ON JH.job_id = J.job_id
WHERE J.name = N'TestFoGConnection' AND step_id <> 0

4. Now, run the above script on the Primary instance. This will create an agent job with a PowerShell script to test the connectivity between your Primary and Secondary on ports 5022 and HADR port. It will take 1 minute or mostly less than 2 min to get the output.

In the output, you will need to check the TcpTestSucceeded column, and the value should be TcpTestSucceeded : True for both the records.

It means the primary to secondary replication is working as expected and no issue. If you see False there, there are issues in your setup itself. All the reasons are available in this article How-to test failover group connectivity between primary and secondary SQL Managed Instances

2. Steps to check replication from Secondary to Primary SQL MI

This is exact opposite of what we did in step 1.

1. Connect to Primary instance and execute the first script on top. Copy the declare parameters of your primary instance details.

2. Connect to Secondary instance and replace the values in the second script that I provided above. Run the script on secondary instance and wait for few minutes to see the result.

3. If your output is having TcpTestSucceeded : True for both the records, then the replication is active and working between secondary and primary region SQL MI.

By using the scripts, you can test the network connectivity or failover or replication check between the two SQL Managed instances. Let me know in the comment section if you are facing issues while running the scripts.

You can also read my other blogposts,

Azure SQL MI Failover Group Pre-Requisites

Azure SQL MI Standby replica and Read replica functionality

How to connect to Azure SQL MI with SSMS using Entra ID

For more technical blogposts, please visit CloudNerchuko.in

Leave a Comment