Disaster Recovery Exercise – Failover and Failback in Azure
I have successfully performed a disaster recovery (DR) failover and failback of application infrastructure. Below are some key highlights and learnings from the activity.
Overview
During the DR exercise, I moved critical application resources from the East US region (source) to the West US region (destination) and then initiated the failback process.
Resources Involved
- Virtual Machines (VMs)
- SQL Databases
- Storage Accounts
- Azure Private DNS
- Internal and External DNS
- Application Gateway
- Private Endpoints
- Azure Firewall
- Azure Site Recovery (ASR)
- TLS Certificates
Important Notes
- Before initiating a failover or failback, DR protection must be enabled to ensure data synchronization between source and destination resources.
- Please note the following cannot be replicated using ASR:
- Azure Resource Groups
- RBAC permissions
- Private Endpoints
- Application Gateway
- Azure Firewall
- SQL logical Server
These components should be provisioned separately to maintain isolation between Production and DR environments.
Key Learnings – Failover
- After failing over a VM, re-protect it to enable a seamless failback process.
- Stop the source region VM before initiating failover to avoid conflicts.
- To rejoin a VM to the domain, first delete the corresponding computer account from Active Directory.
- For ASR to work smoothly, disable “Soft Delete for Blobs” and “Soft Delete for Containers” in the Storage Account.
- Check for any Azure Policies that might block VM creation in the destination region. If applicable, add temporary exceptions.
- After SQL Database failover, ensure the SQL listener endpoint (read-write) is functioning and reachable.
- Both planned and unplanned failovers are supported for Storage Accounts if replication is set to GRS or RA-GRS.
- Post Storage Account failover, replication downgrades to LRS. You must manually switch it back to GRS or RA-GRS for failback.
- Virtual Networks (VNets) and Private Endpoints are region-specific and must be recreated in the target region.
- Firewall rules must be updated to reflect new resource IPs in the destination region.
- Update the DNS records for the Application Gateway accordingly.
Key Learnings – Failback
- Delete the existing VM in the source region before initiating failback. If not deleted, the operation will be blocked.
- ASR applies a resource lock on the VM during protection. This lock must be removed before you can complete the failback.
- Many of the previous failover steps will need to be repeated here as well.
Please like and comment if you found the article interesting and knowledgeable. I would like to answer your query if you have.
Leave a Reply