Azure Answered – A modern approach to DR with Azure
Microsoft Cloud Adoption Framework for Azure provides solid guidance on running your infrastructure in Azure. One of the challenges organisations face is to design and implement a Disaster Recovery (DR) strategy. Properly addressing this is a sign of a mature business and the insurance policy for when things don’t go according to plan. Having Business Continuity Processes (BCP) in place that involve humans, 3rd parties as well as technology? Even better!
A thoughtful IT Leader will be discovering assets, dependencies, assessing current technology solutions on the market & Disaster Recovery as a Service (DRaaS) service providers; and sometimes it leads to “analysis / paralysis”. High Availability (HA), Fault Tolerance, Infrastructure Resilience, Geo-Redundancy, compliance and regulation – the list of concerns may be overwhelming. What are my Recovery Point Objective (RPO) and Recovery Time Objective (RTO) metrics? Can I peer review my approach with other business in the same vertical? Do we have any Single Points of Failure (SPF)? Companies listed on the stock exchange may even find that they must have a DR strategy and DR Plan in place. So where to start and what to do? There’s a lot to consider, so let’s keep breaking it all down…
What does a modern DR approach look like today? It may not be just a temporary/permanent physical incident/disruption at the data centre where your infrastructure is running, as was common thinking once. What about a malicious actor – a cyber-attack or ransomware – compromised infrastructure for an extended period? Fail-over to another location, and that location is already compromised. There are new and evolving risks today that influence today’s DR approach.
Where to start
Security should therefore always be at the forefront of your mind when thinking about DR and associated risks. There’s a shared responsibility in the cloud model where vendors would take care of physical security, but managing secrets and providing appropriate access in a secure manner is ‘your’ responsibility. Azure offers a plethora of white-papers in this regard, such as “Azure security best practices and patterns”, “Enabling Data Residency and Data Protection in Microsoft Azure Regions”
Often DR is associated with one-way migration, aka “lift-and-shift”. Would it matter if your DR Strategy doesn’t effectively cover “fail-back”? IDC published a deep dive white paper into how “Azure Site Recovery and Azure Backup is Helping Improve Business Operations” which is a fantastic resource. RPO for application-consistent workloads can be as low as 30 min. RTO, although dependent on many factors such as the maturity of your DevOps practice – and can be as low as 5 min and fully automated. ASR can do Azure-to-Azure cross region replication too, and is often selected as the more ‘cost effective’ service (or platform) of choice when running natively in Azure.
If your business requires even better RPO, you might consider tools like Zerto – now in version 8 and a very mature product. Zerto can also do one-to-many replication (multiple DR sites), including source/target being on-prem, Azure, AWS or/and GCP and vice versa. You can replicate from one region in Azure to two other regions in Azure, and with greater controls and features supporting fail-back.
There are multiple considerations that should be taken into account while implementing DR. If you have PaaS or even SaaS components as part of your solution – it gets complicated very quickly, however most PaaS providers offer a guidance for implementing a DR strategy – largely involving more DevOps effort!
Typically, you’d start with a DR plan for IaaS, and there are some areas that need to be addressed upfront here:
Active Directory Domain Controllers.
One of the first discussions I’m getting into all the time in my travels is “What about Domain Controllers / DNS?”. The answer is – if you have more than one Domain Controller running in sync, then you need to have a “pilot light” Domain Controller in your DR site syncing with the primary, so that when you fail back you don’t have to deal with sorting out your Active Directory – and no one wants that.
SQL Server on VMs.
More often than not, organisation RPOs and RTOs are not the same for different workloads and components of the line-of-business applications. SQL Server stands out and Azure provides an excellent “Overview of business continuity with Azure SQL Database” resource that covers not just PaaS SQL but IaaS as well. SQL Server license is not exactly cheap so using ASR for SQL that replicates only storage with you not paying for compute is a “dream-come-true” for orgs that are not too demanding on RPO/RTO. One should be mindful of many smart and creative ways to do DR for SQL:
- Failover clustering
- Always On availability groups
- Database mirroring
- Log shipping
- Active geo-replication
- Auto-failover groups
Forget about DFS and DFS-R the same way Exchange Servers are thing of the past. Azure File Sync is modern, solid and you won’t believe how simple it is to set up – a solution to centralize your organization’s file shares. By implementing Azure File Sync you’ll have a solid HA/DR for your file servers with backups, snapshots, governance and lots of other nice features.
Macquarie’s Azure practice does this every day. We’re always up for the chat, and we’ve crafted some key engagements designed to tease out what’s relevant for you and YOUR modern DR approach.