Last week, I was in a discussion with my client about designing their cloud infrastructure. Based on the Well-Architected Framework, I prepared the To-Be Architecture design. Interestingly, the major discussion happened around the application’s infrastructure in the cloud for better uptime and business continuity. It led to a discussion around multiple availability zones (AZ) vs. multiple cloud regions. In this blog, I’m going to elaborate on multiple availability zones (AZ) vs. multiple cloud regions.
Let’s understand the use of multi-AZ for high availability and an uptime SLA of 99.99% or higher. Multi-AZ can also protect against data center outages or “local disasters”, which happen more often than major disasters, whereas multiple regions provide protection against a regional cloud service outage or a major disaster. The table below provides details around Multi-Region and Multi-AZ.
Attributes | Multi-Region | Multi-AZ |
Active-Active HA | No | Yes |
Active-Passive HA | No | Yes |
Protection against Data Center Outage | Yes | Yes |
Protection against Regional Cloud Service Outage | Yes | No |
Protection against a Local Disaster | Yes | Yes |
Recovery from a Major Disaster | Yes | No |
Latency | 10 – 100 ms | <1ms |
Data Replication | Asynchronous | Synchronous |
RPO | Non-Zero | Zero |
Azure Regions and Availability Zones
An Azure region is a geographic area containing multiple data centers. Microsoft has global regions to provide low-latency, high-availability, and compliance for customers.
An Availability Zone (AZ) is a physically separate location within an Azure region. Each zone has independent power, cooling, and networking, ensuring resilience against data center failures.
Each Azure region is partitioned into three or more availability zones (except for a few Azure regions that do not yet have AZs). Each availability zone consists of one or more discrete data centers housed in separate facilities, each with redundant power, networking, and connectivity. Each availability zone is physically separate, so local disasters like fires or flooding would affect one AZ only.
Although availability zones within a region are geographically isolated from each other, they have direct low-latency network connectivity between them, making them the more practical choice for synchronous data replication.
High Availability and Fault Tolerance
A highly available architecture ensures that the application service continues running with minimal or no downtime when any of its parts is taken offline for maintenance or fails unexpectedly. Keeping the services up through planned maintenance events is achievable through redundancy or failover of resources located in the same data center or availability zone. However, spreading the resources across availability zones is highly desirable for better fault tolerance.
Achieving zero RPO
Achieving zero RPO is one of the foremost advantages of a multi-AZ configuration. A well-designed multi-AZ architecture protects against most infrastructure failure types and has a latency low enough for synchronous replication or data mirroring.
Conclusion
Availability zones and multi-AZ configurations in Azure enable a new level of resiliency and fault tolerance at a much lower cost than traditional self-managed “on-premises” data centers. There is an opportunity to leverage multi-AZ architectures to achieve a better uptime SLA. Multi-AZ must be the default approach to high availability, fault tolerance, and local disaster protection when architecting mission-critical application stacks. Multi-region architectures and DR strategies can be layered on top of multi-AZ and will depend on specific business requirements and objectives.
A Multi-Region configuration enables resiliency and fault tolerance at the regional level. Both Multi-AZ and Multi-Region ensure application availability—one at the zone level and the other at the regional level.