Building Resilient ExpressRoute Connectivity for Business Continuity and Disaster Recovery

As more and more organizations adopt Azure for their business-critical workloads, the connectivity between organizations' on-premises networks and Microsoft becomes crucial. ExpressRoute provides the private connectivity between on-premises networks and Microsoft. By default, an ExpressRoute circuit provides redundant connections to Microsoft backbone and is designed for carrier grade . However, the of a connectivity is as good as the robustness of the weakest link in its end-to-end path. Therefore, it is imperative that the customer and the service provider segments of ExpressRoute connectivity are also architected for .

Designing for high availability with ExpressRoute addresses these design considerations and talks about architect a robust end-to-end ExpressRoute connectivity between a customer on-premises network and Microsoft network core. The document addresses maximize high availability of an ExpressRoute in general, as well as components specific to Private peering and to Microsoft peering.

Private Peering High Availability

Each component of the ExpressRoute connectivity is key to build for high availability, including the first mile from on-premises to peering location, from multiple circuits to the same virtual network (VNet), and the virtual network gateway within the VNet.

To improve the availability of ExpressRoute virtual network gateway, Azure offers Zone-redundant virtual network gateways utilizing Availability Zones. ExpressRoute also supports Bidirectional Forwarding Detection (BFD) to expedite link failure detection and thereby significantly improving Mean Time To Recover (MTTR) following a link failure.

Microsoft Peering High Availability

Further, where and how you implement Network Address Translation (NAT) impacts MTTR of Microsoft PaaS services (including O365) consumed over Microsoft Peering following a connection failure. Path selection between the Internet and ExpressRoute on Microsoft Peering is also imperative to ensure a highly reliable and scalable architecture.


ExpressRoute Disaster Recovery Strategy

How about architecting ExpressRoute connectivity for disaster and business continuity? Would it be possible to optimize ExpressRoute circuits in different regions both for local connectivity and to act as a backup for another regional ExpressRoute failure?  In the following architecture, how do you ensure symmetrical cross-regional traffic flow either via Microsoft backbone or via the organization's global connectivity (outside Microsoft)? Designing for disaster recovery with ExpressRoute private peering addresses these concerns and talks about architect for disaster recovery using ExpressRoute private peering.



To build a robust ExpressRoute circuit, end-to-end ExpressRoute connectivity should be architected for high availability that maximizes redundancy and minimizes MTTR following a failure. A robust ExpressRoute circuit can withstand many single-point failures. However, to safeguard against disasters that impact an entire peering location, your disaster recovery plans should include geo-redundant ExpressRoute circuits. Failing over to geo-redundant ExpressRoute circuits face challenges including asymmetrical routing. The following documents help you architect highly available ExpressRoute circuit and design for disaster recovery using geo-redundant ExpressRoute circuits.


This article was originally published by Microsoft's Azure Blog. You can find the original article here.