This blog was updated by Irene Garcia Lopez, Solutions Architect, and Mehdi Yosofie, Solutions Architect, in April 2024 to reflect Karpenter beta changes.
Overview
Karpenter is a dynamic, high performance, open-source cluster autoscaling solution for the Kubernetes platform introduced at re:Invent 2021. Customers choose an autoscaling solution for a number of reasons, including improving the high availability and reliability of their workloads and at the same time reduce costs. With the introduction of EC2 Spot instances, customers can reduce cost up to 90% compared to On-demand prices. EC2 Spot instances are instances created from spare-capacity on AWS that can be interrupted, meaning that the workload must be fault tolerance, flexible and stateless. As containers are meant to be immutable and ephemeral, they are a perfect candidate for EC2 Spot. Combining a high performant cluster autoscaler like Karpenter with EC2 Spot instances, Amazon Elastic Kubernetes Service (Amazon EKS) clusters can acquire compute capacity within minutes while
In this blog post, you will learn how to use Karpenter with EC2 Spot Instances and handle Spot Instance interruptions.
Getting started
To get started with Karpenter in AWS, you need a Kubernetes cluster. You will be using an Amazon EKS cluster throughout this blog post. To provision an Amazon Elastic Kubernetes Service (Amazon EKS) cluster and install Karpenter, please follow the getting started docs from the Karpenter documentation.
Karpenter’s single responsibility is to provision compute capacity for your Kubernetes clusters, which is configured by a custom resource called NodePool. Currently, when a pod is newly created, e.g. by the Horizonal Pod Autoscaler (HPA), kube-scheduler is responsible for finding the best feasible node so that kubelet can run it. If none of the scheduling criteria are met, the pod stays in a pending state and remains unscheduled. Karpenter relies on the kube-scheduler and waits for unscheduled events and then provisions new node(s) to accommodate the pod(s).
Diversification and flexibility are important when using Spot instances: instance types, instance sizes, Availability Zones, and even Regions. Being as flexible as possible enables Karpenter to have a wider choice of spare-capacity pools to choose from and therefore to reduce the risk of interruption. The following code snippet shows an example of a Spot NodePool configuration specifying some constraints:
Node selection
Karpenter makes provisioning decisions based on the scheduling constraints defined on each pending Pod. Karpenter gets the pending pods in batches and binpacks them based on CPU and memory to find the most efficient instance type (i.e. the smallest instance type). Karpenter selects a list of instance types within the diversified range of instance types defined in the NodePool that can fit the pod batch and passes them to the Amazon EC2 Fleet API. EC2 fleet then uses an allocation strategy to select the EC2 instance to launch.
When using Spot instances, Price Capacity Optimized (PCO) allocation strategy is used to select the EC2 Spot pool from that diversified list of instances. The PCO strategy considers both lowest price and lowest chance of being interrupted to select the optimal EC2 Spot pool to reduce frequency of Spot terminations while optimizing for costs. When using On-Demand instances, the lowest-price allocation strategy is used to provision the cheapest instance type.
You can check which instance type has been launched by executing:
You should see a “created nodeclaim” message that lists instance types that can fit your Pods, and a “launched nodeclaim” message indicating the instance type selected:
Capacity Type
When creating a NodePool, you can use either Spot, On-demand, or both. When you specify both and if the pod does not explicitly specify whether it needs to use Spot or On-demand, then Karpenter opts to use Spot when provisioning a node. In the case that Spot capacity is not available, Karpenter falls back to On-Demand instances. However, diversifying our instance types will increase the chances that Karpenter won’t need to provision On-Demand capacity, staying longer on Spot capacity and lowering the costs.
Take into account that if a Spot quota limit has reached at account level you might get a MaxSpotInstanceCountExceeded exception. In this case, Karpenter won’t perform a fallback. You should implement adequate monitoring for quotas and exceptions to create necessary alerts and reach AWS support for the necessary quota increase.
To configure Spot as the capacity type, add this constraint in the NodePool’s requirements block:
Resiliency
Karpenter can handle Spot instance interruption natively: it will automatically cordon and drain the node ahead of the interruption event. The NodePool will launch a new node as soon as it sees the Spot interruption warning, informing that in 2 minutes Amazon EC2 will reclaim the instance.
To enable Spot interruption-handling function, you need to create an SQS queue so that Karpenter watches interruption events, and EventBridge forwards interruption events from AWS services to the SQS queue. Karpenter provides details for provisioning this infrastructure in the CloudFormation template in the Getting Started Guide. Then, configure the –interruption-queue-name CLI argument with the name of the interruption queue provisioned to handle interruption events.
Another useful feature for Spot instances in Karpenter is Consolidation. By default, Karpenter sets the consolidationPolicy to WhenUnderutilized to automatically detect underutilized nodes that can be disrupted (deletion consolidation) or replaced (replacement consolidation) with a smaller and cheaper one. You can modify the consolidation behaviour for your Node Pools in the disruption block as below:
Karpenter versions prior to v0.34.0 only supported replacement consolidation for On-Demand Instances, Spot instances had deletion consolidation policy enabled by default. Since v0.34.0, you can enable the feature gate to use Spot-to-Spot consolidation. You can read more about this in the https://aws.amazon.com/blogs/compute/applying-spot-to-spot-consolidation-best-practices-with-karpenter/ blogpost.
Handling SIGTERM signals is also a best practice when dealing with any kind of interruptions of containers. When an interruption is about to happen, Kubernetes sends a SIGTERM signal to the main process (PID 1) of each container in the Pod that is being evicted to inform about the interruption. Then, it waits some time (30 seconds by default) to shutdown gracefully before sending the final SIGKILL signal that terminates the containers. Therefore, to ensure your processes terminates gracefully you should handle the SIGTERM signal properly.
In order to make use of Spot instances and cost optimize your EKS workload with Karpenter, however to make sure to keep workloads running, you can use Kubernetes Pod Disruption Budgets (PDB) or PodTopologySpreadConstraints. These and other Kubernetes native scheduling constraints such as NodeSelectors, NodeAffinity, Taints and Tolerations are respected by Karpenter, however pod scheduling constraints must fall within a NodePool’s constraints.
Monitoring
Spot interruptions can occur at any time. Monitoring Kubernetes cluster metrics and logs can help to create notifications when Karpenter fails to acquire capacity. You have to setup adequate monitoring at the Kubernetes cluster level for all the Kubernetes objects and monitor the Karpenter NodePool. You will use Prometheus and Grafana to collect the metrics for Kubernetes cluster and Karpenter. CloudWatch Logs will be used to collect the logs.
To get started with Prometheus and Grafana on Amazon EKS, please follow the Prometheus and Grafana installation instruction from the Karpenter getting started guide. The Grafana dashboards are preinstalled with dashboards containing controller metrics, node metrics and pod metrics.
Using the panel Pod Phase that is included in the pre-built Grafana dashboard named Karpenter Capacity, you can check for pods that have Pending status for over a predefined period (e.g. 3 minutes). This will help us to understand if there are any workloads which are unable to be scheduled.
Karpenter controller logs can be sent to CloudWatch Logs using either Fluent Bit or FluentD. (Here’s information on how to get started with CloudWatch Logs for Amazon EKS.) To view the Karpenter controller logs, go to the log group /aws/containerinsights/cluster-name/application and search for Karpenter.
In the log stream, search for Provisioning failed log messages in the Karpenter controller logs for any provisioning failures. The example below shows provisioning failure due to reaching the account limit for Spot Instances.
Clean up
To avoid incurring any additional charges, don’t forget to clean up the resources you created. If you followed the getting started docs from the Karpenter documentation, check the “Delete the cluster” section. The below example shows how to uninstall Karpenter using helm:
- Uninstall Karpenter controller (depending on how you installed Karpenter, the below example shows using helm)
1. Delete service account, the following command assumes that you have used eksctl
- Delete the stack using aws cloudformation delete-stack –stack-name Karpenter-${CLUSTER_NAME}or terraform destroy -var cluster_name=$CLUSTER_NAME
- Delete the cluster if you have created if needed using eksctl delete cluster –name ${CLUSTER_NAME}
Conclusion
In this blog post, you learned about Karpenter and how we you can use EC2 Spot Instances with Karpenter to scale the compute needs in an Amazon EKS cluster. You can to check out the Further Reading section below to discover more about Karpenter.
Further Reading
- Karpenter – https://karpenter.sh/docs/concepts/
- Karpenter Spot Prioritization – https://karpenter.sh/docs/faq/#what-if-there-is-no-spot-capacity-will-karpenter-use-on-demand
- Spot Instance Best Practices – https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-best-practices.html#be-instance-type-flexible
- Karpenter Blueprints on Github: https://github.com/aws-samples/karpenter-blueprints