Deliver Namespace as a Service multi tenancy for Amazon EKS using Karpenter

11 months ago 40
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Introduction

Karpenter is an open-source, high-performance Kubernetes cluster autoscaler that automatically provisions new nodes in response to unschedulable pods. Customers choose Karpenter for many reasons, such as improving the efficiency and cost of running workloads in their clusters. Karpenter works by configuring a custom resource called Provisioner. This Provisioner sets constraints on the nodes that can be created by Karpenter and the pods that can run on those nodes.

Customers who are considering a multi-tenant Amazon Elastic Kubernetes Service (Amazon EKS) clusters are looking to share cluster resources across different teams (i.e., tenants). However, they still require a node isolation for critical and highly regulated workloads. While deploying Karpenter on a multi-tenant Amazon EKS cluster, Karpenter doesn’t support namespace as metadata on its own CRDs (both AWSNodeTemplate and Provisioners). This post will show how we used Karpenter to provision node, scale up, and scale down the cluster per tenant without impacting other ones.

Walkthrough

This solution uses admission controllers using Open Policy Agent GateKeeper to enforce taint/tolerations and Node Selector on those Nodes that’ll be created by Karpenter on specific namespaces.

This example, we’re going to use the following scenario:

  • We have two tenants, TenantA and TenantB, that need to run deployments in different namespaces
  • TenantA will use a namespace named tenant-a and TenantB will use a namespace named tenant-b
  • TenantA workloads must run on nodes apart of PoolA and TenantB workloads should run on nodes apart of PoolB
  • As a consumer of the Amazon EKS cluster, your tenant will be completely isolated and they won’t make changes to their pod specs to schedule their pods in a particular namespace

Prerequisites

Create two namespaces

Create two namespaces called tenant-a and tenant-b with the commands:

kubectl create ns tenant-a kubectl create ns tenant-b

Confirm you have two newly created namespaces with the following command:

kubectl get ns | grep -i tenant

Create a default deny-all network policy

By default, pods aren’t isolated for egress and ingress traffic: all inbound and outbound connections are allowed. In a multi-tenant environment, where users share the same Amazon EKS cluster, they require an isolation between their namespaces, pods, or external services. Kubernetes NetworkPolicy helps control traffic flow at the IP address or port. Please check this section on Amazon EKS best practices to build a multi-tenant EKS cluster.

VPC-CNI supports Network Policies natively start from its version 1.14 on Amazon EKS 1.25 or later. It integrates with the upstream Kubernetes Network Policy Application Programming Interface (API), ensuring compatibility and adherence to Kubernetes standards. You can define policies using different identifiers supported by the upstream API. As best practices, defining network policies has to follow a principal of least privilege. First, we create a deny all policy that restricts all inbound and outbound traffic across namespaces, and then we start allowing traffic like allow Domain Name System (DNS) queries, etc. For more details, you can check this section on Amazon EKS best practices guide.

In this example, we’ll use a network policy that denies all traffic across namespaces and allow dns queries for service name resolutions:

cat << EOF > deny-all.yaml kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: namespace: default name: deny-from-other-namespaces spec: podSelector: matchLabels: ingress: - from: - podSelector: {} EOF kubectl create -f deny-all.yaml -n tenant-a kubectl create -f deny-all.yaml -n tenant-b cat << EOF > allow-dns-access.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-dns-access spec: podSelector: matchLabels: {} policyTypes: - Egress egress: - to: - namespaceSelector: matchLabels: kubernetes.io/metadata.name: kube-system podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53 EOF kubectl create -f allow-dns-access.yaml -n tenant-a kubectl create -f allow-dns-access.yaml -n tenant-b Verify all the network policies are applied in place: kubectl get networkpolicies -A

Install Karpenter with proper Provisioner files for node pools configuration

After Installing Karpenter on the Amazon EKS Cluster, create an AWSNodeTemplate and two Provisioner files as shown below, we’ll create node pools ourselves using a combination of taints/tolerations and node labels. Use the schema below when creating the Provisioner:

  • Nodes in PoolA will have:
    • A NoSchedule taint with key node-pool and value pool-a
    • A label with key node-pool and value pool-a
  • Nodes in PoolB will have:
    • A NoSchedule taint with key node-
    • pool and value pool-b
    • A label with key node-pool and value pool-b

Create manifest default-awsnodetemplate.yaml:

export CLUSTER_NAME=<YOUR_EKS_CLUSTER_NAME> cat << EOF > default-awsnodetemplate.yaml apiVersion: karpenter.k8s.aws/v1alpha1 kind: AWSNodeTemplate metadata: name: default spec: subnetSelector: karpenter.sh/discovery: $CLUSTER_NAME securityGroupSelector: karpenter.sh/discovery: $CLUSTER_NAME instanceProfile: KarpenterNodeInstanceProfile-$CLUSTER_NAME tags: karpenter.sh/discovery: $CLUSTER_NAME EOF

Create manifest called pool-a.yaml:

cat << 'EOF' > pool-a.yaml apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: pool-a spec: providerRef: name: default taints: - key: node-pool value: pool-a effect: NoSchedule labels: node-pool: pool-a ttlSecondsAfterEmpty: 30 EOF

Create manifest called pool-b.yaml:

cat << 'EOF' > pool-b.yaml apiVersion: karpenter.sh/v1alpha5 kind: Provisioner metadata: name: pool-b spec: providerRef: name: default taints: - key: node-pool value: pool-b effect: NoSchedule labels: node-pool: pool-b ttlSecondsAfterEmpty: 30 EOF

You can save and apply the Provisioner to your cluster by running the following command:

kubectl create -f default-awsnodetemplate.yaml kubectl create -f pool-a.yaml kubectl create -f pool-b.yaml

Deploy OPA Gatekeeper policies

Confirm OPA Gatekeeper is deployed and running in your cluster with this command:

kubectl get deployment -n gatekeeper-system NAME READY UP-TO-DATE AVAILABLE AGE gatekeeper-audit 1/1 1 1 24h gatekeeper-controller-manager 3/3 3 3 24h

Forcing deployments on specific Namespace to be deployed on the proper Node Pool

Using OPA Gatekeeper ,we can force our Deployment to be deployed on the proper Node Pool based on the Namespace. Using Admission Controller we can Mutate the pod to add a nodeSelector and tolerations to the spec. By using a nodeSelector, it allows teams to still define their own nodeAffinity to provide additional guidance on how Karpenter should provision nodes. Rather than writing our own admission controller, we will used OPA Gatekeeper and its mutation capability.

Here are the assigned policies that we will used for Pool A and Pool B and similarly we need to do for each Namespace (node Pool).

Create a policy called nodepool-selector-pool-a:

cat << 'EOF' > nodepool-selector-pool-a.yaml apiVersion: mutations.gatekeeper.sh/v1 kind: Assign metadata: name: nodepool-selector-pool-a spec: applyTo: - groups: - "" kinds: - Pod versions: - v1 location: spec.nodeSelector match: kinds: - apiGroups: - '*' kinds: - Pod namespaces: - tenant-a scope: Namespaced parameters: assign: value: node-pool: pool-a EOF kubectl create -f nodepool-selector-pool-a.yaml

Create a policy called nodepool-selector-pool-b:

cat << 'EOF' > nodepool-selector-pool-b.yaml apiVersion: mutations.gatekeeper.sh/v1 kind: Assign metadata: name: nodepool-selector-pool-b spec: applyTo: - groups: - "" kinds: - Pod versions: - v1 location: spec.nodeSelector match: kinds: - apiGroups: - '*' kinds: - Pod namespaces: - tenant-b scope: Namespaced parameters: assign: value: node-pool: pool-b EOF kubectl create -f nodepool-selector-pool-b.yaml

We need pods to be assigned to the worker nodes so the OPA policy can use that and apply it to the newly created worker nodes. Create the toleration using the manifests as shown below:

cat << 'EOF' > nodepool-toleration-pool-a.yaml apiVersion: mutations.gatekeeper.sh/v1 kind: Assign metadata: name: nodepool-toleration-pool-a spec: applyTo: - groups: - "" kinds: - Pod versions: - v1 location: spec.tolerations match: kinds: - apiGroups: - '*' kinds: - Pod namespaces: - tenant-a scope: Namespaced parameters: assign: value: - key: node-pool operator: Equal value: pool-a EOF cat << 'EOF' > nodepool-toleration-pool-b.yaml apiVersion: mutations.gatekeeper.sh/v1 kind: Assign metadata: name: nodepool-toleration-pool-b spec: applyTo: - groups: - "" kinds: - Pod versions: - v1 location: spec.tolerations match: kinds: - apiGroups: - '*' kinds: - Pod namespaces: - tenant-b scope: Namespaced parameters: assign: value: - key: node-pool operator: Equal value: pool-b EOF kubectl create -f nodepool-toleration-pool-a.yaml kubectl create -f nodepool-toleration-pool-b.yaml

Testing it out

Now that we have our node pools defined and the mutation capability, let’s create a deployment for each of our tenants and make sure it is functioning.

Run the follow command to create the deployment:

kubectl create deployment nginx --image=nginx --replicas 3 -n tenant-a kubectl expose deployment nginx --port=8080 --target-port=80 -n tenant-a kubectl create deployment nginx --image=nginx --replicas 3 -n tenant-b kubectl expose deployment nginx --port=8080 --target-port=80 -n tenant-b

As you can see when creating a specific deployment in tenant-a namespace, we have the NodeSelector and Tolerations added to the Pod Specs through OPA.

kubectl get nodes -L node-pool NAME STATUS ROLES AGE VERSION NODE-POOL ... ip-10-100-10-109.us-west-2.compute.internal Ready <none> 36m v1.27.4-eks-8ccc7ba pool-b ip-10-100-22-46.us-west-2.compute.internal Ready <none> 65m v1.27.4-eks-8ccc7ba pool-a ...

In the following pod specification, note the node-selectors and tolerations:

kubectl describe pods nginx-55f598f8d-6q5fs -n tenant-a Name: nginx-55f598f8d-6q5fs Namespace: tenant-a Priority: 0 Service Account: default Node: ip-10-100-19-121.us-west-2.compute.internal/10.100.19.121 Start Time: Sat, 23 Sep 2023 19:08:47 +0200 Labels: app=nginx pod-template-hash=55f598f8d Annotations: <none> Status: Running IP: 10.100.16.70 IPs: IP: 10.100.16.70 Controlled By: ReplicaSet/nginx-55f598f8d Containers: nginx: Container ID: containerd://f410c5f0979dffcac1e5299a18960f523aaabbb251969113ef1ff182a0711b24 Image: nginx Image ID: docker.io/library/nginx@sha256:32da30332506740a2f7c34d5dc70467b7f14ec67d912703568daff790ab3f755 Port: 80/TCP Host Port: 0/TCP State: Running Started: Sat, 23 Sep 2023 19:09:12 +0200 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-k92mc (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: kube-api-access-k92mc: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: BestEffort Node-Selectors: node-pool=pool-a Tolerations: node-pool=pool-a node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Let’s confirm these new nodes is running with the following command:

kubectl get nodes -L node-pool | grep -i pool

Last thing, let’s make sure that ingress traffic for both namespaces are blocked:

kubectl create deployment curl-tenant-a —image=curlimages/curl:latest — sleep 3600 -n tenant-a kubectl get pods -n tenant-a NAME READY STATUS RESTARTS AGE curl-bdff4577d-rhcwv 1/1 Running 0 11m nginx-55f598f8d-6q5fs 1/1 Running 0 14m nginx-55f598f8d-cqjnj 1/1 Running 0 14m nginx-55f598f8d-sx4d7 1/1 Running 0 14m We use curl command form tenant-a to nginx running service on tenant-b. kubectl exec -i -t -n tenant-a curl-bdff4577d-rhcwv — curl -v —max-time 3 nginx.tenant-b.svc.cluster.local:8080 Trying 172.20.81.215:8080... Connection timed out after 3000 milliseconds Closing connection curl: (28) Connection timed out after 3000 milliseconds command terminated with exit code 28

Now you should see, the proper worker nodes starting up in their own namespace and isolated, thanks to Karpenter and Network Policies implemented by VPC-CNI!

Note: it might take a moment or two for the nodes to startup and join the cluster.

Cleaning up

After you complete this experiment, you can delete the Kubernetes deployment and respective resources.

kubectl delete -f default-awsnodetemplate.yaml kubectl delete -f pool-a.yaml kubectl delete -f pool-b.yaml kubectl delete ns tenant-a kubectl delete ns tenant-b kubectl delete ns gatekeeper

Delete your EKS Cluster (this depends on how you created your EKS cluster)

Conclusion

In this post, we showed you how to use Karpenter along with admission controller ,like OPA Gatekeeper. We were able to assign labels, tolerations to our deployment of pods, labels, and taints assigned to the newly created nodes by Karpenter via different provisioner (i.e., one provisioner per namespace). Together with Network policy provided by VPC-CNI, we were able to have a multi-tenant environment on top of Amazon EKS scalable and each workload is isolated from one to another.

Read Entire Article