Optimize AZ traffic costs using Amazon EKS, Karpenter, and Istio Editorial

11 months ago 50
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

In the evolving cloud-native landscape, enterprises utilizing Amazon Elastic Kubernetes Service (Amazon EKS) often encounter challenges that hinder their pursuit of operational efficiency and cost-effectiveness. Notable among these challenges are the costs associated with Cross Availability Zone (AZ) traffic, with difficulties associated with achieving seamless scalability, hurdles in provisioning right-sized instances for nodes, and intricacies in managing service networking via topology-aware load balancing. These hurdles not only escalate operational costs but also pose a threat to achieving high availability and efficient scaling of resources.

This post aims to present a robust solution by integrating Istio, Karpenter, and Amazon EKS to directly address the aforementioned challenges and pain points. The proposed solution intends to provide a roadmap towards optimizing Cross AZ traffic costs, which ensures high availability in a cost-optimized manner, and promotes efficient scaling of the underlying compute resources.

By diving into this implementation, you’ll gain insights into using the combined potential of Istio, Karpenter, and Amazon EKS to overcome scalability challenges, provision right-sized instances, and manage service networking effectively – all while maintaining tight control over operational costs. Through this guidance, we aim to equip enterprises with the knowledge required to optimize their Amazon EKS environments for enhanced performance, scalability, and cost-efficiency.

A smart way to cut costs and ensure smooth operations is by spreading your workloads (i.e., pods) across different zones. Key Kubernetes features like Pod affinity, Pod anti-affinity, NodeAffinity, and Pod Topology Spread can help with this, and tools like Istio and Karpenter further enhance this strategy.

Let’s learn about these Kubernetes constructs in detail:

Pod affinity:

Pod affinity empowers you with the ability to influence Pod scheduling based on the presence or absence of other Pods. By defining rules, you can dictate whether Pods should be co-located or dispersed across different AZs, which aids in network cost optimization and potentially enhancing application performance.

Pod anti-affinity:

Acting as the counter to Pod affinity, Pod anti-affinity ensures that specified Pods aren’t scheduled on the same node, which is instrumental for high availability and redundancy. This mechanism guards against the simultaneous loss of critical services during a node failure.

NodeAffinity:

Operating along similar lines as Pod affinity, NodeAffinity zeroes in on nodes based on specific labels like instance type or availability zone. This feature facilitates optimized Amazon EKS cluster management by assigning Pods to suitable nodes, which can lead to cost reduction or performance improvement.

Pod Topology Spread:

Pod Topology Spread provides the means to evenly distribute Pods across defined topology domains, such as nodes, racks, or AZs. Employing topology spread constraints promotes better load balancing and fault tolerance, paving the way for a more resilient and balanced cluster.

Beyond these Kubernetes features, tools like Istio and Karpenter further contribute to refining your Amazon EKS cluster. Istio addresses service networking challenges, while Karpenter tackles right-sized instance provisioning, both pivotal for scaling efficiency and cost management.

Istio

Istio is a service mesh, which uses the high-performance Envoy proxy to streamline the connection, management, and security of microservices. It simplifies traffic management, security, and observability, allowing developers to delegate these tasks to the service mesh, while also enabling detailed metric collection on network traffic between microservices via sidecar proxies.

Karpenter

Karpenter is designed to provide the right compute resources to match your application’s needs in seconds, instead of minutes by observing the aggregate resource requests of unschedulable pods and makes decisions to launch and terminate nodes to minimize scheduling latencies.

By combining Pod affinity, Pod Anti-affinity, Node Affinity, Pod topology spread, Istio, and Karpenter, organizations can optimize their network traffic within Amazon EKS clusters, reducing cross-AZ traffic and potentially saving on associated costs.

By incorporating these best practices into your Amazon EKS cluster configuration, you can achieve an optimized, cost-effective, and highly available Kubernetes environment on AWS.

Continuously evaluate your cluster’s performance and adjust the configuration as needed to maintain the ideal balance between cost savings, high availability, and redundancy.

Optimize AZ traffic costs

We’re going to deploy two versions of a simple application behind a ClusterIP service. This makes it easier to identify the outcomes of Load Balancing configurations later on. Each version of the application yields a distinct response for the same endpoint, facilitating the differentiation of results from various destinations. When you query the istio-ingress-gateway’s LoadBalancer along with the relevant application endpoint, the responses received will help in distinguishing the results emanating from different destinations.

Solution Overview

Architecture showcasing Istio, EKS, Karpenter for Optimizing Costs

We’re going to use Istio’s weighted distribution feature by configuring a DestinationRule object to control the traffic between AZs. The goal is to route the majority of the traffic coming from the load balancer to the pods running in the same AZ.

Prerequisites

  1. Provision an Amazon EKS Cluster using eksdemo command line interface (CLI):
eksdemo create cluster istio-demo -i m5.large -N 3
  1. Install Karpenter to manage node provisioning for the cluster:
eksdemo install autoscaling-karpenter -c istio-demo
  1. Install eks-node-viewer for visualizing dynamic node usage within the cluster.

After the installation, run eks-node-viewer in the terminal to see existing three nodes of your Amazon EKS cluster provisioned during cluster creation.

Example output:

eks-node-viewer output

  1. Install Istio version 1.19.1 using istioctl CLI and install demo profile:
istioctl install —set profile=demo
  1. Verify all the pods and services have been installed:
kubectl -n istio-system get pods

Example output:

istio pods in istio-system namespace

  1. Validate Istio ingress gateway has a public external IP assigned
kubectl -n istio-system get svc

Example output:

istio services in the istio-system namespaces
Let’s now proceed with deploying the solution. All the deployment configurations we’re going to be using in this post are available on the GitHub repo – amazon-eks-istio-karpenter

Deploy Karpenter NodePool and NodeClasses:

NodePools are a fundamental component in Karpenter, they serve as a set of rules that dictate the characteristics of the nodes Karpenter provisions and the pods that can be scheduled on them. With NodePools, users can define specific taints to control pod placement, set temporary startup taints, restrict nodes to particular zones, instance types, and architectures, and even establish node expiration defaults. It’s crucial to have at least one NodePool configured, as Karpenter relies on them to function. Karpenter evaluates each NodePool in sequence, which avoids those whose taints are incompatible with a pod’s tolerations. Importantly, creating mutually exclusive NodePools is recommended to ensure clear, unambiguous pod scheduling, with Karpenter opting for the highest-weighted NodePool in cases of overlap.

Node Classes enable configuration of AWS specific settings. Each NodePool must reference an EC2NodeClass using spec.template.spec.nodeClassRef. Multiple NodePools may point to the same EC2NodeClass.

We’ll be using default namespace for deployment of all the resources to keep it simple. Now, let’s deploy NodePool and EC2NodeClass.

Update your Amazon EKS cluster name under EC2NodeClass specification section and run the following command to deploy express-test  NodePool and EC2NodeClass.

Review the nodepool.yaml file on GitHub.

cat <<EoF> nodepool.yaml apiVersion: karpenter.sh/v1beta1 kind: NodePool metadata: name: express-test spec: template: metadata: labels: managedBy: karpenter nodepool: express-test spec: requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: kubernetes.io/os operator: In values: ["linux"] - key: karpenter.sh/capacity-type operator: In values: ["spot", "on-demand"] - key: karpenter.k8s.aws/instance-category operator: In values: ["c", "m", "r"] - key: topology.kubernetes.io/zone operator: In values: ["us-west-2a", "us-west-2b", "us-west-2c"] nodeClassRef: name: express-test limits: cpu: 1000 disruption: consolidationPolicy: WhenUnderutilized expireAfter: 720h # 30 * 24h = 720h --- apiVersion: karpenter.k8s.aws/v1beta1 kind: EC2NodeClass metadata: name: express-test spec: amiFamily: AL2 # Amazon Linux 2 role: "KarpenterNodeRole-<cluster-name>" # replace with your cluster name subnetSelectorTerms: - tags: karpenter.sh/discovery: "<cluster-name>" # replace with your cluster name securityGroupSelectorTerms: - tags: karpenter.sh/discovery: "<cluster-name>" # replace with your cluster name EoF kubectl apply -f nodepool.yaml

Now, whenever there’s an unscheduled pod, Karpenter uses this express-test Nodepool to launch On-Demand and Spot instances based on availability in us-west-2a, us-west-2b, and us-west-2c Regions. The instances will be of type [“c”, “m”, “r”] with “amd64” architecture.

There are more configuration options you can make use of to effectively manage your instances with Karpenter. You can check  Karpenter concepts to explore all the options

Deploy Istio Virtual Service and Gateway

VirtualService is responsible for directing traffic incoming from istio-ingressgateway, which can be established using the express-test-gateway resource definition file. Observing the routing rules on VirtualServicespec, you’ll notice that it directs traffic for the test URI to the ClusterIP service associated with the application.

Run the below two commands to deploy express-test-virtualservice and express-test-gateway.

Review the virtual-service.yaml file  on the GitHub.

cat <<EoF> virtual-service.yaml apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: express-test-virtualservice namespace: default spec: hosts: - "*" gateways: - express-test-gateway http: - match: - uri: prefix: /test route: - destination: host: express-test port: number: 8080 retries: attempts: 3 perTryTimeout: 2s EoF kubectl apply -f virtual-service.yaml

Review the gateway.yaml file on the GitHub.

cat <<EoF> gateway.yaml apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: express-test-gateway namespace: default spec: selector: istio: ingressgateway # use istio default controller servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" EoF kubectl apply -f gateway.yaml

Advanced routing can be implemented using the Destination-Rule resource, which we will explore later in this post.

Deploy Sample application and ClusterIP service

Run the following command to make two deployments of same application in two different versions:

  • express-test deployment runs single replica of version 1.1.2
  • express-test-2 deployment runs single replica of version 1.1.4

Review the deployment.yaml file on the GitHub.

cat <<EoF> deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: express-test spec: replicas: 1 selector: matchLabels: app: express-test template: metadata: labels: app: express-test spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - express-test topologyKey: "topology.kubernetes.io/zone" nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: managedBy operator: In values: - karpenter - key: nodepool operator: In values: - express-test terminationGracePeriodSeconds: 0 containers: - name: express-test image: public.ecr.aws/k3d0y0m9/express-test:1.1.2 resources: requests: cpu: "1" --- apiVersion: apps/v1 kind: Deployment metadata: name: express-test-2 spec: replicas: 1 selector: matchLabels: app: express-test template: metadata: labels: app: express-test spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: managedBy operator: In values: - karpenter - key: nodepool operator: In values: - express-test terminationGracePeriodSeconds: 0 containers: - name: express-test image: public.ecr.aws/k3d0y0m9/express-test:1.1.4 resources: requests: cpu: "1" EoF kubectl apply -f deployment.yaml

The configuration of podAntiAffinity is based on the topology constraint set for AZs. This setup repels the pod replicas initiated by other deployments, as they carry the identical label app: express-test. This configuration ensures high availability by allowing only one pod instance from each deployment to run per AZ.

Node affinity is also tailored to guarantee that the pods will only be scheduled onto the nodes provisioned by Karpenter. The scheduling of pods will occur only if both labels – managedBy: karpenter and nodepool: express-test match accurately.

The deployments in discussion possess a ClusterIP service that routes traffic to their replicas. Thanks to the selector, it perceives them as originating from the same application, facilitating seamless traffic forwarding.

Run the following command to deploy the ClusterIP service:

Review the service.yaml file  on the GitHub.

cat <<EoF> service.yaml apiVersion: v1 kind: Service metadata: name: express-test spec: selector: app: express-test type: ClusterIP ports: - protocol: TCP port: 8080 targetPort: 8080 EoF kubectl apply -f service.yaml

Now, within a few seconds Karpenter provisions two new instances to schedule pods from both of the deployments. The existing nodes on the cluster doesn’t have the label that matches the deployment spec to schedule the pods.

You can monitor the Karpenter logs to see how the process works using following command:

kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter

You should now be able to see two new nodes provisioned in separate AZ’s to run the two pods of express-test and express-test-2 respectively.
Run eks-node-viewer to see the new nodes Karpenter provisioned:

Example output:

eks-nove-viewer showing karpenter provisioned nodes

As noted previously, Karpenter launched two c7a.large Spot instances to schedule the pods.

Check the status of deployments and pods using below commands:

kubectl get svc 

Example output:

showing kubernetes service

kubectl get pods -o wide

Example output:

output of kubectl get pods

kubectl get nodes --selector "karpenter.sh/nodepool=express-test,<a href="http://topology.kubernetes.io/zone=us-west-2a">topology.kubernetes.io/zone=us-west-2a</a>"

Example output:

output of kubectl get nodes

kubectl get nodes --selector "karpenter.sh/nodepool=express-test,<a href="http://topology.kubernetes.io/zone=us-west-2a">topology.kubernetes.io/zone=us-west-2b</a>"

Example output:

No nodes are deployed in US-West-2c

kubectl get nodes --selector "karpenter.sh/nodepool=express-test,<a href="http://topology.kubernetes.io/zone=us-west-2a">topology.kubernetes.io/zone=us-west-2c</a>"

Example output:

nodes matching labels

In my case, the two pods of express-test and express-test-2 are scheduled on nodes running on us-west-2c and us-west-2a, respectively. Hence the command resulted No resources found for resources in us-west-2b.

Now, let’s move on to controlling traffic flow between AZ’s using Istio’s DestinationRule resource. DestinationRule sets rules for traffic going to a service after it has been directed there. It tells how to communicate with the resources behind the service, like pods. This can be about sharing traffic between pods, how many connections a pod can have, and more.

We’ll explore this DestinationRule resource in three scenarios below: 

  1. Check traffic flow without creating the DestinationRule resource
  2. Create DestionationRule object and modify weights to direct all the traffic to us-west-2a
  3. Modify weights on DestinationRule object to distribute traffic among three AZ’s: us-west-2a, us-west-2b and us-west-2c

To test the traffic, we’ll use the following test script, which essentially sends 10 requests to Istio ingress gateway. The istio ingress gateway routes the traffic to the application pods through the LoadBalancer.

Set the GATEWAY_URL environment variable first using below command:

export GATEWAY_URL=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}') echo "$GATEWAY_URL"

Run the loadbalancing-test-script.sh  script from GitHub.

cat <<EoF> loadbalancing-test-script.sh #!/bin/bash # Set variables counter=1 FILE=test-results.txt HEADING="-----WEIGHTED LOCALITY LOAD BALANCING TEST RESULTS-----" URL="http://${GATEWAY_URL}/test" # Check if test-results file exists. If so, delete content. if test -f "$FILE"; then echo "$FILE exists." > $FILE cat $FILE fi # Add heading echo $HEADING > $FILE # Query the endpoint 10x until [ $counter -gt 10 ] do echo >> $FILE curl $URL >> $FILE ((counter++)) done cat $FILE EoF

Scenario 1:

Check traffic flow without creating the DestinationRule resource. Let’s check how the load balancer directs the traffic to the pods.

Run the script using command:

 ./loadbalancing-test-script.sh 

Example output:

load test script results

Each request retuns a response as seen above with the version.

“Simple Node App Working v1.1.2” indicates that request was served by express-test deployment running in us-west-2c.

“Simple Node App Working v1.1.4” indicates that request was served by express-test-2 deployment running in us-west-2a.
The traffic is distributed evenly between two pods among the total 10 requests.

Scenario 2:

Create DestionationRule object and modify weights to direct all the traffic to us-west-2a.

Now, let’s create the DestinationRule object with the following rules:

Run the following command to create express-test-dr DestinationRule object. Review the destination-rule.yaml file on GitHub.

cat <<EoF> destination-rule.yaml apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: express-test-dr spec: host: express-test.default.svc.cluster.local trafficPolicy: loadBalancer: localityLbSetting: distribute: - from: us-west-2/us-west-2a/* to: "us-west-2/us-west-2a/*": 98 "us-west-2/us-west-2b/*": 1 "us-west-2/us-west-2c/*": 1 - from: us-west-2/us-west-2b/* to: "us-west-2/us-west-2a/*": 98 "us-west-2/us-west-2b/*": 1 "us-west-1/us-west-2c/*": 1 - from: us-west-2/us-west-2c/* to: "us-west-2/us-west-2a/*": 98 "us-west-2/us-west-2b/*": 1 "us-west-2/us-west-2c/*": 1 connectionPool: http: http2MaxRequests: 10 maxRequestsPerConnection: 10 outlierDetection: consecutiveGatewayErrors: 1 interval: 1m baseEjectionTime: 30s EoF kubectl apply -f destination-rule.yaml

You can see under localityLbSetting there are certain weights for different AZ destinations based on where the traffic is coming from. The above weights ensures the traffic is directed to us-west-2a irrespective of the origin.

Run the script loadbalancing-test-script.sh again to see all the results from pods running version 1.1.4:

Example output:

load test script results

As seen above, all the requests are directed to express-test-2 deployment running in US-West-2a. This scenario can be implemented in situations like AZ failures to failover all the traffic to active AZ’s avoiding any impact.

Scenario 3:

Modify weights on the DestinationRule object to distribute traffic among three AZs: us-west-2a, us-west-2b and us-west-2c. For critical workloads, it is recommended to distribute weightage across multiple AZs to ensure high availability and resilience to AZ failures. The weightage below ensures that a large number of requests coming from a specific AZ routes the traffic to a pod running in the same AZ, which avoids cross-AZ traffic. The weightage on destination-rules can be distributed similarly by running the command to edit the existing DestinationRule Object and modify weights similar to the following:

kubectl edit destinationrule express-test-dr

Example output:

edit the istio destination rule

Run the loadbalancing-test-script.sh again to check the traffic flow.

Example output:

load test script results

In my case, because my request origin is us-west-2a, hence most of requests(8/10) reached express-test-2 running in us-west-2a while very few (2/10) reached express-test running in us-west-2c. The weights configured in this scenario acts as an example to reduce costs by reducing cross AZ traffic and also ensure high availability in case of AZ failure.

Overall, this demonstration shows that by activating Istio and setting up a destination rule for our ClusterIP Service, we enable topology-aware routing, minimizing notable cross-AZ costs. Istio also ensures high availability by defaulting to pods in other Availability Zones when a pod in the same zone is unavailable.

Additionally, integrating Karpenter as your your node autoscaling solution offers further cost optimization through efficient instance deployments.

Let’s look at the quick demonstration of how Karpenter’s Consolidation helps in efficiently scaling resources and optimizing costs.

If consolidation is enabled for a NodePool, Karpenter attempts to reduce the overall cost of the nodes launched by that NodePool. How does it do this? Karpenter imagines all the pods being kicked out from a chosen node. Then it checks if these can be put on a mix of old nodes and a new, cheaper one. This checking keeps in mind all the rules you’ve set for your pods and NodePools. If pods fit only on the old nodes, the chosen node is removed. If they fit on both old nodes and a cheaper one, then the cheaper node is started, and the chosen one is removed. This makes the pods restart and schedule on the new instance.

We’ve enabled consolidation already while creating the NodePool. Review the NodePool specifications by running the following command.

kubectl describe nodepool express-test

Example output:

kubectl describe node output

The Consolidaton Policy is set to “WhenUnderutilized” . This means Karpenter considers all nodes for consolidation and attempts to remove or replace Nodes when it discovers that the Node is underutilized and could be changed to reduce cost. Review more options for consolidation under Karpenter Disruption concept

Let’s use our same application to observe this behavior. Run the following command to create an express-test-3 deployment:

Review the express-test-3.yaml file on the GitHub.

cat <<EoF> express-test-3.yaml apiVersion: apps/v1 kind: Deployment metadata: name: express-test-3 spec: replicas: 1 selector: matchLabels: app: express-test template: metadata: labels: app: express-test spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: managedBy operator: In values: - karpenter - key: nodepool operator: In values: - express-test terminationGracePeriodSeconds: 0 containers: - name: express-test image: public.ecr.aws/k3d0y0m9/express-test:1.1.2 resources: requests: cpu: "1" EoF kubectl apply -f express-test-3.yaml

Due to Node, Pod affinity, and Resource constraints placed on express-test and express-test-2 deployments, Karpenter launches a new c7a.large node to schedule the express-test-3 pod.

You can visualize the new node using eks-node-viewer:

Example output:

eks-node-viewer showing karpenter provisioned nodes

Now, scale the replicas on express-test-3 deployment to five replicas. Karpenter cannot run all five of them on the new node due to resource limitations; hence, it creates a new node with a larger capacity to schedule the replicas:

Run the following command to scale the deployment:

kubectl scale deployment express-test-3 --replicas 5 

As the consolidation feature is enabled, Karpenter imagines how cost effectively manage the instances and launches bigger instance c7i.2xlarge to schedule all the five replicas of express-test-3 and one pod instance of express-test-2. The express-test-2 deployment doesn’t have Pod affinity constraints. Hence, it is scheduled along with other pods on express-test-3. Once all the pods are scheduled on new instance, Karpenter terminates two old c7a.large instances.

Example output:

eks-node-viewer showing karpenter provisioned nodes

You can also check status of pods to confirm all the pods of express-test-3 and express-test-2 are running on new c7i.2xlarge instance.

Run the following command to check the pods status.

kubectl get pods -o wide 

Example output:

command output showing kubectl get pods output

Above output confirms all five replicas of express-test-3 and one replica of express-test-2 are running on same instance. As the results show, with consolidation enabled, Karpenter optimized the cluster compute costs by terminating two of the three instances, and consolidating to one bigger instance. The data plane cost is reduced which can be confirmed by comparing two screenshots of eks-node-viewer outputs.

Best practices to optimize AZ traffic costs:

  • Evaluate your workload to pinpoint pods that would gain from co-location (i.e., Pod affinity) or segregation (i.e., Pod anti-affinity), aiding in crafting suitable rules for your distinct scenario.
  • Employ NodeAffinity to align Pods with nodes showcasing specific traits, like instance type or AZ, fostering cost optimization and performance enhancement.
  • Utilize Pod Topology Spread to ascertain a balanced Pod dispersion across varied domains, boosting fault tolerance and availability.
  • Incorporate Istio within your Amazon EKS cluster to leverage its sophisticated traffic management and security functionalities, assisting in trimming down AZ traffic expenses while elevating service resilience.
  • Configure Karpenter for automatic node provisioning and scaling aligned with your workload demands, contributing to cost and resource wastage reduction by ensuring utilization of only indispensable resources.

Cleaning up

To avoid incurring additional operational costs, remember to destroy all the infrastructure you created by running this command:

eksdemo delete cluster -c &lt;cluster-name&gt;

Conclusion

In this post, we showed you how to optimize cross-AZ traffic and ensure high availability in your Amazon EKS cluster is achievable by embracing a three-pronged strategy. First, deploying Pod affinity, Node Affinity, and Pod Topology Spread can help distribute your Pods across multiple AZs evenly, thereby minimizing cross-AZ traffic costs and ensuring redundancy.

Second, integrate Istio to refine traffic management in EKS, with features like locality load balancing and fine-grained routing rules promoting efficient traffic flow within the same locality. Istio’s AZ-aware service discovery, controlled failover, and telemetry tools provide strategic traffic management and valuable insights to help control costs further.

Lastly, implementing Karpenter in your Amazon EKS clusters enables intelligent scaling based on actual workload demands. Karpenter’s ability to select suitable Amazon EC2 instance types and integrate with Spot Instances promotes optimal resource utilization, reducing over-provisioning and leading to significant cost savings. By adhering to the strategies and best practices highlighted in this post, you can significantly curtail cross-AZ data transfer costs, which are a notable factor in your Amazon EKS cluster’s overall expenditure.

Read Entire Article