Users often need to host their Kubernetes workloads in specific locations, geographies, or on-premises to meet data locality or low-latency requirements. Amazon Elastic Kubernetes Service (EKS) has a broad range of deployments options from in the cloud to on-premises on customer-managed hardware with Amazon EKS Anywhere. To extend AWS infrastructure and APIs to users on-premises, they could use an AWS Outposts.
With Amazon EKS on Outposts, you can choose to run in extended or local clusters. With extended clusters, the Kubernetes control plane runs in an AWS Region, and the nodes run on Outposts. You could also choose to run the nodes on AWS Local Zones.
There are multiple approaches we can take to design resiliency and Disaster Recovery (DR) for Amazon EKS workloads running on Outposts:
Use multiple physical Outposts for DR: To increase resilience or expand capacity users might decide to deploy the Amazon EKS data plane across multiple physical Outposts (see the infra-VPC AWS Containers post for more information), but it might not always be a viable option, for example with data center constraints.
Use Local Zones for DR: In this scenario, users could consider a Local Zone if it is available in a suitable location, as it brings AWS services closer to the end users with on-demand scaling and pay-as-you-go pricing. Where there is a suitable AWS Region, we recommend deploying there.
In this post, we describe how Local Zones can be used as a DR option for Amazon EKS workloads running on Outposts.
Solution overview
In this solution, we deploy separate Amazon EKS control planes for the Outposts and Local Zone environments. This separation allows for independent operations and failure isolation between the two sites. The Outpost is paired with an AWS Region that is different from the Local Zone to provide geographical redundancy and reduce the risk of Regional failover scenarios impacting both sites simultaneously. Local Zones are used as a cost-effective DR site due to their consistent AWS APIs and operational models with Outposts.
This solution adopts a GitOps-based methodology for managing and synchronizing workloads across the Outposts and Local Zone environments. GitOps provides a declarative approach to infrastructure and application management, enabling version control, automated deployments, and consistent configuration across multiple environments. This approach simplifies operational workflows and makes sure of reliable and repeatable deployments.
To facilitate failover, Amazon Route 53 health checks are implemented to continuously monitor the availability and responsiveness of the workloads running on the Outposts. In the event of a failure or degraded performance, Route 53 automatically redirects traffic to the failover site in the Local Zone, making sure of seamless service continuity.
Note that this design focuses on stateless workload failover and does not address stateful data replication or persistent storage considerations.
The following diagram shows a high-level design pattern for this solution:
Walkthrough
In this post, we demonstrate this solution by deploying a game application through GitOps on two separate EKS clusters: one for Outposts and another for the Local Zone. Worker nodes for each Kubernetes cluster run on an Outpost and Local Zone, respectively. Then, we show failing over the game application from the primary EKS cluster to the secondary.
The high-level steps of the solution are provided in the following details:
- Set up AWS CodeCommit repository
- Configure Flux CD and deploy a sample application
- Failover the workload from the Outpost to the Local Zone using Route 53
Prerequisites
This walkthrough has the following pre-requisites:
- A Command Line Interface (CLI) with the following tools installed: Git, kubectl, helm, awscli, fluxcli. The AWS Command Line Interface (AWS CLI) is used as part of the solution, but you can consider using you preferred infrastructure as the code tool.
- An Outpost for your primary EKS cluster
- A Local Zone enabled in the Region where you would like to deploy your secondary EKS cluster. For more information, see the getting started with Local Zones documentation. In addition, the Local Zones feature page in the AWS documentation outlines which services are available in which Local Zones.
- Two EKS clusters created with IAM Roles for Service Accounts (IRSA) enabled:
- One dedicated for the Outpost deployment the other for Local Zone.
- Deploy worker nodes in Outposts and Local Zones, respectively.
Step A: Set up a CodeCommit repository and variables
In this solution, a CodeCommit repository is used to store the Flux CD configuration files, which includes game2048 application code and Kubernetes objects. Follow the Setup for HTTPS users using Git credentials CodeCommit (Step 1 to Step 3) prerequisite guide to configure Git credentials for CodeCommit.
Populate environment variables for Outposts. We need values for the outposts_subnet (subnet where the Application Load Balancer (ALB) is going to reside), and the outposts_eks_cluster_name.
Run the following command to create a CodeCommit repository, replace the repository name, AWS Region, and description with your own values:
Next, replace <code-commit-git-repo-url> with your own Git repository URL and run the command to clone the repository:
Step B: Configure Flux CD and deploy a sample application
Now that your CodeCommit repository is set up, deploy the Kubernetes application backend and its resources. Flux CD is used to automate the deployment and management of applications and associated resources on the two EKS clusters across Outposts and Local Zones.
Run the following commands to create the Flux directory structure:
Configure and upload aws-load-balancer-controller
Create the manifest to download the AWS Load Balancer Controller application through the Helm chart:
Setup a kustomization file to define namespace to kube-system for AWS Load Balancer Controller
Configure and upload game2048 application
Create a game-2048.yaml manifest to configure the game2048 application from the public repository. Configure the service and ingress exposed through the AWS Load Balancer Controller within the Kubernetes cluster to expose the application externally:
Configure and upload outposts configuration
Let’s create a helm-patch.yaml file to customize and modify the values of the AWS Load Balancer Controller HelmRelease to set the Outposts environment specific configuration.
We need to create an AWS Identity and Access Management (IAM) policy and role to associate it with the Kubernetes service account used by the AWS Load Balancer controller. Follow Step 1: Create IAM Role using eksctl in the AWS Load Balancer Controller guide and then execute the following command.
Create a kustomization file to configure the deployment of the AWS Balancer Controller Helm release by applying a patch:
Here we create a patch ingress file to deploy Ingress for game2048 in the subnet dedicated for the ALB:
Deploy the kustomization file so the base game2048 doesn’t need modifications. This follows GitOps principles by keeping manifests generic and changes declarative. Flux applies the changes automatically on each reconcile:
The last step is to add kustomization files to the flux-system namespace, so that it deploys theAWS Load Balancer Controller and game2048 application.
Add the following manifest file to deploy AWS Load Balancer Controller:
Configure and upload Local Zones configuration
The Local Zones steps are going to be almost identical to Outposts except for the patch values for the AWS Load Balancer Controller and the game2048 application.
Switch context and connect to the Local Zones cluster:
Follow the same AWS Load Balancer controller policy and role creation guide (Step 1) like we did for Outposts.
We are copying the Outposts files and using the “sed” command, replacing outposts specific values with Local Zones values.
Adjust helm-patch.yaml to point to the Local Zone cluster:
Adjust ingress-patch.yaml and insert the Local Zone subnets:
The last step is to update the path of the manifest file:
Now let’s run Git to push the files we’ve added to the repository:
FluxCD in action
Now that we have our YAML files, kustomizations, and folder structure created correctly and pushed to the CodeCommit repository, let’s configure flux secrets and point flux to watch the appropriate directory to deploy resources in Kubernetes.
Switch context and connect to the Outposts cluster:
1. Install the Flux application, first deploy it your Outposts EKS cluster:
flux install
2. Create Flux kubernetes secrets.
Run the following and replace the <> values with your own values:
3. Get Flux watching the directory:
4. Flux creates kustomization:
Switch context and connect to the Local Zones cluster:
Repeat Steps 1, 2, and, 3 for Local Zones.
For Step 4, update the –path field to --path="./clusters/localzone/flux-system” and execute the command.
In a few minutes you see resources being deployed in the Outposts and Local Zones EKS clusters. Run the following command to reconcile Flux:
To validate Kubernetes resources:
Repeat the preceding command on the Local Zone EKS cluster as well.
Example output:
Step C: DR and failover through Route 53
For demonstration purposes, we create a dummy public hosted zone in Route 53 called www.example.com, which points to the Outposts cluster ALB as the primary record and Local Zones ALB as the secondary. To simulate a failure, we configure the health checks to fail on the Outposts ALB. This triggers Route 53 to fail over and direct live traffic to the Local Zones cluster instead.
Create a public hosted zone in Route 53
First create a public hosted zone in Route 53. A public hosted zone is a container that holds information about how you want to route traffic on the internet for a specific domain, such as example.com, and its subdomains (acme.example.com, zenith.example.com). Replace the <> with your own values:
Validate that public zone has been created correctly:
Example output:
Now let’s create Route 53 health checks, which we associate with primary and secondary DNS records.
Create a Route 53 health check for primary record:
The workload running on the Outpost is the primary record. Run the following command and replace the <DNS-of-Outposts-ALB> with your own value:
Create a Route 53 health check for secondary record:
The workload running in the Local Zone is the secondary record. Run the following command, and replace <DNS-of-LocalZone-ALB> with your own value:
Validate that health checks for both Outposts and Local Zones have been created, and store the Route 53 health check ID, which is needed for the next steps:
Example output:
Create A records for primary, and point to ALB alias for Outposts EKS
In the Route 53 public hosted zone create an A record pointing to the primary site ALB.
You need the following values:
- Public Hosted Zone ID
- DNS of Outposts ALB
- Zone of the Outposts ALB – this can be retrieved from browsing to the Load balancersà Details, under Hosted zone you should see a 14-digit value such as Z123456789KTTX
- Health Check ID for the Outposts ALB
Replace the preceding values in the following command and run:
Create A records for secondary and point to ALB alias for Local Zone EKS
Now repeat the preceding step with the secondary site:
Note that DNS records may take some time to propagate. Browse to the site. In this post we are using a dummy domain www.example.com.
You can use the dig command to note which public IP addresses are serving this application on Outposts:
Example output:
Currently this application is being served from the workload running on the Outpost. To simulate a failover, we modify the Route 53 health check for Outposts by changing to a dummy port 801, which is not listening for traffic. As you can see, this leads to a health check failure and points the traffic to the Local Zone EKS cluster, which is set as secondary.
Navigate to the Route 53 Console, and under Health Check change the port to 801 for the Outposts health check.
We see that the status for the health check for Outposts is now unhealthy:
Once the health checks report unhealthy, the DNS is updated. Browse to www.example.com again and the application is accessible. This time, the traffic is being served by the application deployed on the Local Zone.
To confirm, let’s run the dig command again and see if the IP addresses have completely changed this time and are now pointing to the Local Zones application.
Example output:
As we can see, the www.example.com application is now pointing to a different set of IP addresses associated with Local Zones workloads, and thus validating that our DR failover is working correctly.
Cleaning up
To avoid incurring future charges, delete resources created as part of this walkthrough.
- Delete the Route 53 Zones, following this document.
- Delete Route 53 Health Checks, following this document.
- Run the following command to delete game-2048 namespace and AWS Load Balancer Controller
5. Switch context and connect to the Local Zone EKS cluster and repeat the previous Steps 3 and 4 in this section.
6. Finally, delete the CodeCommit repository:
Conclusion
In this post, we showed you how AWS Local Zones can be used as a DR option for Amazon EKS workloads running on AWS Outposts. By creating separate Amazon EKS control planes for the Outpost and Local Zones, and using a GitOps strategy with Flux CD to deploy resources to both environments, we implemented an active-passive DR setup. Amazon Route 53 with health checks and failover routing policies allowed us to serve traffic from the Outpost cluster as primary, failing over to the Local Zone cluster on an outage. This architecture showcases how Amazon EKS workloads on Outposts can achieve resiliency against site failures by using Local Zones for recovery. The techniques presented provide an option for users looking to implement a low-cost DR for critical Amazon EKS applications hosted across Outposts and Local Zones.