Disaster Recovery on AWS Outposts to AWS Local Zones with a GitOps approach for Amazon EKS

7 months ago 51
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Users often need to host their Kubernetes workloads in specific locations, geographies, or on-premises to meet data locality or low-latency requirements. Amazon Elastic Kubernetes Service (EKS) has a broad range of deployments options from in the cloud to on-premises on customer-managed hardware with Amazon EKS Anywhere. To extend AWS infrastructure and APIs to users on-premises, they could use an AWS Outposts.

With Amazon EKS on Outposts, you can choose to run in extended or local clusters. With extended clusters, the Kubernetes control plane runs in an AWS Region, and the nodes run on Outposts. You could also choose to run the nodes on AWS Local Zones.

There are multiple approaches we can take to design resiliency and Disaster Recovery (DR) for Amazon EKS workloads running on Outposts:

Use multiple physical Outposts for DR: To increase resilience or expand capacity users might decide to deploy the Amazon EKS data plane across multiple physical Outposts (see the infra-VPC AWS Containers post for more information), but it might not always be a viable option, for example with data center constraints.

Use Local Zones for DR: In this scenario, users could consider a Local Zone if it is available in a suitable location, as it brings AWS services closer to the end users with on-demand scaling and pay-as-you-go pricing. Where there is a suitable AWS Region, we recommend deploying there.

In this post, we describe how Local Zones can be used as a DR option for Amazon EKS workloads running on Outposts.

Solution overview

In this solution, we deploy separate Amazon EKS control planes for the Outposts and Local Zone environments. This separation allows for independent operations and failure isolation between the two sites. The Outpost is paired with an AWS Region that is different from the Local Zone to provide geographical redundancy and reduce the risk of Regional failover scenarios impacting both sites simultaneously. Local Zones are used as a cost-effective DR site due to their consistent AWS APIs and operational models with Outposts.

This solution adopts a GitOps-based methodology for managing and synchronizing workloads across the Outposts and Local Zone environments. GitOps provides a declarative approach to infrastructure and application management, enabling version control, automated deployments, and consistent configuration across multiple environments. This approach simplifies operational workflows and makes sure of reliable and repeatable deployments.

To facilitate failover, Amazon Route 53 health checks are implemented to continuously monitor the availability and responsiveness of the workloads running on the Outposts. In the event of a failure or degraded performance, Route 53 automatically redirects traffic to the failover site in the Local Zone, making sure of seamless service continuity.

Note that this design focuses on stateless workload failover and does not address stateful data replication or persistent storage considerations.

The following diagram shows a high-level design pattern for this solution:

High Level Design Pattern

Walkthrough

In this post, we demonstrate this solution by deploying a game application through GitOps on two separate EKS clusters: one for Outposts and another for the Local Zone. Worker nodes for each Kubernetes cluster run on an Outpost and Local Zone, respectively. Then, we show failing over the game application from the primary EKS cluster to the secondary.

The high-level steps of the solution are provided in the following details:

  1. Set up AWS CodeCommit repository
  2. Configure Flux CD and deploy a sample application
  3. Failover the workload from the Outpost to the Local Zone using Route 53

Prerequisites

This walkthrough has the following pre-requisites:

  • A Command Line Interface (CLI) with the following tools installed: Git, kubectl, helm, awscli, fluxcli. The AWS Command Line Interface (AWS CLI) is used as part of the solution, but you can consider using you preferred infrastructure as the code tool.
  • An Outpost for your primary EKS cluster
  • A Local Zone enabled in the Region where you would like to deploy your secondary EKS cluster. For more information, see the getting started with Local Zones documentation. In addition, the Local Zones feature page in the AWS documentation outlines which services are available in which Local Zones.
  • Two EKS clusters created with IAM Roles for Service Accounts (IRSA) enabled:
    • One dedicated for the Outpost deployment the other for Local Zone.
    • Deploy worker nodes in Outposts and Local Zones, respectively.

Step A: Set up a CodeCommit repository and variables

In this solution, a CodeCommit repository is used to store the Flux CD configuration files, which includes game2048 application code and Kubernetes objects. Follow the Setup for HTTPS users using Git credentials CodeCommit (Step 1 to Step 3) prerequisite guide to configure Git credentials for CodeCommit.

Populate environment variables for Outposts. We need values for the outposts_subnet (subnet where the Application Load Balancer (ALB) is going to reside), and the outposts_eks_cluster_name.

export outposts_subnet=subnet-0123xxxxx export outposts_eks_cluster_name=<your-outposts-eks-cluster>

Run the following command to create a CodeCommit repository, replace the repository name, AWS Region, and description with your own values:

aws codecommit create-repository \ --repository-name eks-outposts-lz-repo \ --repository-description "Amazon-EKS-Outposts-LZ-DR" \ --region <region>

Next, replace <code-commit-git-repo-url> with your own Git repository URL and run the command to clone the repository:

mkdir ~/flux cd ~/flux git clone <code-commit-git-repo-url> eks-outposts-lz-repo

Step B: Configure Flux CD and deploy a sample application

Now that your CodeCommit repository is set up, deploy the Kubernetes application backend and its resources. Flux CD is used to automate the deployment and management of applications and associated resources on the two EKS clusters across Outposts and Local Zones.

Run the following commands to create the Flux directory structure:

cd ~/flux/eks-outposts-lz-repo mkdir -p apps/{base/game-2048,localzone,outposts} mkdir -p infrastructure/{base,localzone,outposts,base/ingress-controller} mkdir -p clusters/{outposts/flux-system,localzone/flux-system}

Configure and upload aws-load-balancer-controller

Create the manifest to download the AWS Load Balancer Controller application through the Helm chart:

cat << EoF > ~/flux/eks-outposts-lz-repo/infrastructure/base/ingress-controller/alb.yaml --- apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease metadata: name: aws-load-balancer-controller namespace: kube-system spec: chart: spec: chart: aws-load-balancer-controller reconcileStrategy: ChartVersion sourceRef: kind: HelmRepository name: eks namespace: flux-system version: 1.5.4 interval: 10m0s timeout: 10m0s releaseName: ee values: serviceAccount: name: aws-load-balancer-controller EoF

Setup a kustomization file to define namespace to kube-system for AWS Load Balancer Controller

cat << EoF > ~/flux/eks-outposts-lz-repo/infrastructure/base/ingress-controller/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization namespace: kube-system resources: - alb.yaml EoF

Configure and upload game2048 application

Create a game-2048.yaml manifest to configure the game2048 application from the public repository. Configure the service and ingress exposed through the AWS Load Balancer Controller within the Kubernetes cluster to expose the application externally:

echo Creating initial Flux GitOps configuration for the App cat <<EOF > ~/flux/eks-outposts-lz-repo/apps/base/game-2048/namespace.yaml apiVersion: v1 kind: Namespace metadata: name: game-2048 EOF cat <<EOF > ~/flux/eks-outposts-lz-repo/apps/base/game-2048/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: namespace: game-2048 name: deployment-2048 spec: selector: matchLabels: app.kubernetes.io/name: app-2048 replicas: 1 template: metadata: labels: app.kubernetes.io/name: app-2048 spec: containers: - image: public.ecr.aws/l6m2t8p7/docker-2048:latest imagePullPolicy: Always name: app-2048 ports: - containerPort: 80 EOF cat <<EOF > ~/flux/eks-outposts-lz-repo/apps/base/game-2048/service.yaml apiVersion: v1 kind: Service metadata: namespace: game-2048 name: service-2048 spec: ports: - port: 80 targetPort: 80 protocol: TCP type: NodePort selector: app.kubernetes.io/name: app-2048 EOF cat <<EOF > ~/flux/eks-outposts-lz-repo/apps/base/game-2048/ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: namespace: game-2048 name: ingress-2048 annotations: alb.ingress.kubernetes.io/scheme: internet-facing alb.ingress.kubernetes.io/target-type: ip spec: ingressClassName: alb rules: - http: paths: - path: / pathType: Prefix backend: service: name: service-2048 port: number: 80 EOF cat <<EOF > ~/flux/eks-outposts-lz-repo/apps/base/game-2048/kustomization.yaml apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization namespace: game-2048 resources: - namespace.yaml - deployment.yaml - service.yaml - ingress.yaml EOF

Configure and upload outposts configuration

Let’s create a helm-patch.yaml file to customize and modify the values of the AWS Load Balancer Controller HelmRelease to set the Outposts environment specific configuration.

We need to create an AWS Identity and Access Management (IAM) policy and role to associate it with the Kubernetes service account used by the AWS Load Balancer controller. Follow Step 1: Create IAM Role using eksctl in the AWS Load Balancer Controller guide and then execute the following command.

Create a kustomization file to configure the deployment of the AWS Balancer Controller Helm release by applying a patch:

cat << EoF > ~/flux/eks-outposts-lz-repo/infrastructure/outposts/kustomization.yaml --- apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../base/ingress-controller patches: - path: helm-patch.yaml EoF

Here we create a patch ingress file to deploy Ingress for game2048 in the subnet dedicated for the ALB:

cat << EoF > ~/flux/eks-outposts-lz-repo/apps/outposts/ingress-patch.yaml --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: namespace: game-2048 name: ingress-2048 annotations: alb.ingress.kubernetes.io/subnets: $outposts_subnet EoF

Deploy the kustomization file so the base game2048 doesn’t need modifications. This follows GitOps principles by keeping manifests generic and changes declarative. Flux applies the changes automatically on each reconcile:

cat << EoF > ~/flux/eks-outposts-lz-repo/apps/outposts/kustomization.yaml --- apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../base/game-2048 patches: - path: ingress-patch.yaml --- apiVersion: kustomize.config.k8s.io/v1beta1 kind: Kustomization resources: - ../base/ingress-controller patches: - path: helm-patch.yaml EoF

The last step is to add kustomization files to the flux-system namespace, so that it deploys theAWS Load Balancer Controller and game2048 application.

Add the following manifest file to deploy AWS Load Balancer Controller:

cat << EoF > ~/flux/eks-outposts-lz-repo/clusters/outposts/flux-system/apps.yaml --- apiVersion: source.toolkit.fluxcd.io/v1beta2 kind: HelmRepository metadata: name: eks namespace: flux-system spec: interval: 10m0s url: https://aws.github.io/eks-charts --- apiVersion: kustomize.toolkit.fluxcd.io/v1beta2 kind: Kustomization metadata: name: aws-load-balancer-controller namespace: flux-system spec: interval: 10m0s path: ./infrastructure/outposts prune: true sourceRef: kind: GitRepository name: flux-system wait: true --- apiVersion: kustomize.toolkit.fluxcd.io/v1beta2 kind: Kustomization metadata: name: game-2048 namespace: flux-system spec: dependsOn: - name: aws-load-balancer-controller interval: 10m0s path: ./apps/outposts prune: true sourceRef: kind: GitRepository name: flux-system wait: true EoF

Configure and upload Local Zones configuration

The Local Zones steps are going to be almost identical to Outposts except for the patch values for the AWS Load Balancer Controller and the game2048 application.

Switch context and connect to the Local Zones cluster:

kubectl config use-context <localzone-context-name>

Follow the same AWS Load Balancer controller policy and role creation guide (Step 1) like we did for Outposts.

We are copying the Outposts files and using the “sed” command, replacing outposts specific values with Local Zones values.

cd ~/flux/eks-outposts-lz-repo cp infrastructure/outposts/* infrastructure/localzone/ cp apps/outposts/* apps/localzone/ cp clusters/outposts/flux-system/* clusters/localzone/flux-system/

Adjust helm-patch.yaml to point to the Local Zone cluster:

sed -i -e "s/$outposts_eks_cluster_name/$localzone_eks_cluster_name/g" ~/flux/eks-outposts-lz-repo/infrastructure/localzone/helm-patch.yaml

Adjust ingress-patch.yaml and insert the Local Zone subnets:

sed -i "s/$outposts_subnet/$localzone_subnets/g" ~/flux/eks-outposts-lz-repo/apps/localzone/ingress-patch.yaml

The last step is to update the path of the manifest file:

sed -i "s/outposts/localzone/g" ~/flux/eks-outposts-lz-repo/clusters/localzone/flux-system/apps.yaml

Now let’s run Git to push the files we’ve added to the repository:

cd ~/flux/eks-outposts-lz-repo git add . git commit -m "initial setup" git push

FluxCD in action

Now that we have our YAML files, kustomizations, and folder structure created correctly and pushed to the CodeCommit repository, let’s configure flux secrets and point flux to watch the appropriate directory to deploy resources in Kubernetes.

Switch context and connect to the Outposts cluster:

kubectl config use-context <outposts-context-name>

1. Install the Flux application, first deploy it your Outposts EKS cluster:

flux install

2. Create Flux kubernetes secrets.

Run the following and replace the <> values with your own values:

flux create secret git flux-system \ --url=<git-repo-url> \ --username=<username> \ --password=<password>

3. Get Flux watching the directory:

flux create source git flux-system \ --url=<git-repo-url> \ --branch=master \ --secret-ref=flux-system \ --interval=1m

4. Flux creates kustomization:

flux create kustomization flux-system \ --source=GitRepository/flux-system \ --path="./clusters/outposts/flux-system" \ --prune=true \ --interval=10m

Switch context and connect to the Local Zones cluster:

kubectl config use-context <localzone-context-name>

Repeat Steps 1, 2, and, 3 for Local Zones.

For Step 4, update the –path field to --path="./clusters/localzone/flux-system” and execute the command.

In a few minutes you see resources being deployed in the Outposts and Local Zones EKS clusters. Run the following command to reconcile Flux:

flux reconcile source git flux-system

To validate Kubernetes resources:

kubectl get ingress,pod,service -n game-2048

Repeat the preceding command on the Local Zone EKS cluster as well.

Example output:

Step C: DR and failover through Route 53

For demonstration purposes, we create a dummy public hosted zone in Route 53 called www.example.com, which points to the Outposts cluster ALB as the primary record and Local Zones ALB as the secondary. To simulate a failure, we configure the health checks to fail on the Outposts ALB. This triggers Route 53 to fail over and direct live traffic to the Local Zones cluster instead.

Create a public hosted zone in Route 53

First create a public hosted zone in Route 53. A public hosted zone is a container that holds information about how you want to route traffic on the internet for a specific domain, such as example.com, and its subdomains (acme.example.com, zenith.example.com). Replace the <> with your own values:

aws route53 create-hosted-zone \ --name <hosted-zone-name> \ --caller-reference <hosted-zone-caller-reference>

Validate that public zone has been created correctly:

aws route53 list-hosted-zones

Example output:

Now let’s create Route 53 health checks, which we associate with primary and secondary DNS records.

Create a Route 53 health check for primary record:

The workload running on the Outpost is the primary record. Run the following command and replace the <DNS-of-Outposts-ALB> with your own value:

aws route53 create-health-check \ --caller-reference "Primary" \ --health-check-config '{"Port": 80, "Type": "HTTP", "ResourcePath": "/", "FullyQualifiedDomainName": "<DNS-of-Outposts-ALB>", "RequestInterval": 30, "FailureThreshold": 3}'

Create a Route 53 health check for secondary record:

The workload running in the Local Zone is the secondary record. Run the following command, and replace <DNS-of-LocalZone-ALB> with your own value:

aws route53 create-health-check \ --caller-reference "Secondary" --health-check-config '{"Port": 80, "Type": "HTTP", "ResourcePath": "/", "FullyQualifiedDomainName": "<DNS-of-LocalZone-ALB>", "RequestInterval": 30, "FailureThreshold": 3}'

Validate that health checks for both Outposts and Local Zones have been created, and store the Route 53 health check ID, which is needed for the next steps:

aws route53 list-health-checks

Example output:


Create A records for primary, and point to ALB alias for Outposts EKS

In the Route 53 public hosted zone create an A record pointing to the primary site ALB.

You need the following values:

  • Public Hosted Zone ID
  • DNS of Outposts ALB
  • Zone of the Outposts ALB – this can be retrieved from browsing to the Load balancersà Details, under Hosted zone you should see a 14-digit value such as Z123456789KTTX
  • Health Check ID for the Outposts ALB

Replace the preceding values in the following command and run:

aws route53 change-resource-record-sets \ --hosted-zone-id <Public-hosted-zone-id> --change-batch '{"Changes":[{"Action":"CREATE","ResourceRecordSet":{"Name":"www.example.com","Type":"A","AliasTarget":{"HostedZoneId":"<zone-id-of-Outposts-ALB>","DNSName":"<DNS-of-Outposts-ALB>","EvaluateTargetHealth": true},"Failover":"PRIMARY","SetIdentifier":"outposts","HealthCheckId":"<health-check-id-for-Outposts-ALB>"}}]}'

Create A records for secondary and point to ALB alias for Local Zone EKS

Now repeat the preceding step with the secondary site:

aws route53 change-resource-record-sets \ --hosted-zone-id <Public-hosted-zone-id> --change-batch '{"Changes":[{"Action":"CREATE","ResourceRecordSet":{"Name":"www.example.com","Type":"A","AliasTarget":{"HostedZoneId":"<zone-id-of-LocalZone-ALB>","DNSName":"<DNS-of-LocalZoneALB>","EvaluateTargetHealth": true},"Failover":"SECONDARY","SetIdentifier":"localzone","HealthCheckId":"<health-check-id-for-LocalZone-ALB>"}}]}'

Note that DNS records may take some time to propagate. Browse to the site. In this post we are using a dummy domain www.example.com.

You can use the dig command to note which public IP addresses are serving this application on Outposts:

dig +short www.example.com

Example output:

Currently this application is being served from the workload running on the Outpost. To simulate a failover, we modify the Route 53 health check for Outposts by changing to a dummy port 801, which is not listening for traffic. As you can see, this leads to a health check failure and points the traffic to the Local Zone EKS cluster, which is set as secondary.

Navigate to the Route 53 Console, and under Health Check change the port to 801 for the Outposts health check.

We see that the status for the health check for Outposts is now unhealthy:

Once the health checks report unhealthy, the DNS is updated. Browse to www.example.com again and the application is accessible. This time, the traffic is being served by the application deployed on the Local Zone.

To confirm, let’s run the dig command again and see if the IP addresses have completely changed this time and are now pointing to the Local Zones application.

dig +short www.example.com

Example output:

As we can see, the www.example.com application is now pointing to a different set of IP addresses associated with Local Zones workloads, and thus validating that our DR failover is working correctly.

Cleaning up

To avoid incurring future charges, delete resources created as part of this walkthrough.

  1. Delete the Route 53 Zones, following this document.
  2. Delete Route 53 Health Checks, following this document.
  3. Run the following command to delete game-2048 namespace and AWS Load Balancer Controller
kubectl delete namespace game-2048 kubectl delete helmrelease aws-load-balancer-controller -n kube-system

4. Uninstall Flux

flux uninstall

5. Switch context and connect to the Local Zone EKS cluster and repeat the previous Steps 3 and 4 in this section.

6. Finally, delete the CodeCommit repository:

aws codecommit delete-repository --repository-name eks-outposts-lz-repo

Conclusion

In this post, we showed you how AWS Local Zones can be used as a DR option for Amazon EKS workloads running on AWS Outposts. By creating separate Amazon EKS control planes for the Outpost and Local Zones, and using a GitOps strategy with Flux CD to deploy resources to both environments, we implemented an active-passive DR setup. Amazon Route 53 with health checks and failover routing policies allowed us to serve traffic from the Outpost cluster as primary, failing over to the Local Zone cluster on an outage. This architecture showcases how Amazon EKS workloads on Outposts can achieve resiliency against site failures by using Local Zones for recovery. The techniques presented provide an option for users looking to implement a low-cost DR for critical Amazon EKS applications hosted across Outposts and Local Zones.

Read Entire Article