I have successfully performed a disaster recovery (DR) failover and failback of application infrastructure. Below are some key highlights and learnings from the activity.

Overview

During the DR exercise, I moved critical application resources from the East US region (source) to the West US region (destination) and then initiated the failback process.

Resources Involved

  • Virtual Machines (VMs)
  • SQL Databases
  • Storage Accounts
  • Azure Private DNS
  • Internal and External DNS
  • Application Gateway
  • Private Endpoints
  • Azure Firewall
  • Azure Site Recovery (ASR)
  • TLS Certificates

Important Notes

  • Before initiating a failover or failback, DR protection must be enabled to ensure data synchronization between source and destination resources.
  • Please note the following cannot be replicated using ASR:
  1. Azure Resource Groups
  2. RBAC permissions
  3. Private Endpoints
  4. Application Gateway
  5. Azure Firewall
  6. SQL logical Server

These components should be provisioned separately to maintain isolation between Production and DR environments.

Key Learnings – Failover

  • After failing over a VM, re-protect it to enable a seamless failback process.
  • Stop the source region VM before initiating failover to avoid conflicts.
  • To rejoin a VM to the domain, first delete the corresponding computer account from Active Directory.
  • For ASR to work smoothly, disable “Soft Delete for Blobs” and “Soft Delete for Containers” in the Storage Account.
  • Check for any Azure Policies that might block VM creation in the destination region. If applicable, add temporary exceptions.
  • After SQL Database failover, ensure the SQL listener endpoint (read-write) is functioning and reachable.
  • Both planned and unplanned failovers are supported for Storage Accounts if replication is set to GRS or RA-GRS.
  • Post Storage Account failover, replication downgrades to LRS. You must manually switch it back to GRS or RA-GRS for failback.
  • Virtual Networks (VNets) and Private Endpoints are region-specific and must be recreated in the target region.
  • Firewall rules must be updated to reflect new resource IPs in the destination region.
  • Update the DNS records for the Application Gateway accordingly.

Key Learnings – Failback

  • Delete the existing VM in the source region before initiating failback. If not deleted, the operation will be blocked.
  • ASR applies a resource lock on the VM during protection. This lock must be removed before you can complete the failback.
  • Many of the previous failover steps will need to be repeated here as well.

Please like and comment if you found the article interesting and knowledgeable. I would like to answer your query if you have.