This post was co-authored by Vignesh Senapathy, Principal DevOps Engineer, HPE.
About Hewlett Packard Enterprise (HPE) Aruba Networking
HPE Aruba Networking EdgeConnect Cloud Orchestrator is a cloud-native Software-Defined Wide Area Network (SD-WAN) orchestrator within HPE Aruba Networking’s portfolio. Serving as a centralized SD-WAN controller, it oversees both physical and virtual SD-WAN gateways throughout the enterprise edge. By establishing a secure virtual network overlay, EdgeConnect Cloud Orchestrator effectively governs all SD-WAN application policies that steer, secure, and optimize application traffic across the WAN, ensuring the highest quality of experience for end users while simplifying administration and operations.
HPE Aruba Networking was an early adopter of containerized workloads, choosing to run EdgeConnect Cloud Orchestrator on Docker Swarm. Docker Swarm was initially an effective solution for cluster management, but looking forward, a more scalable framework that could simplify the lifecycle operations of large clusters was needed. To address these challenges and meet their current and future needs, HPE Aruba Networking needed a new container platform.
In this blog post, we discuss HPE Aruba Networking’s journey to modernize its container platform with Amazon Elastic Kubernetes Services (Amazon EKS), from the challenges they faced with their previous container platform to their current and future requirements for a new container platform. We’ll also cover why they chose Amazon EKS to overcome those challenges, and go into detail about the implementation process.
Solution Overview
Key Drivers to Modernize Container Platform
Docker Swarm is a powerful container orchestration platform that can be a good fit for relatively simple and small-to-medium scale clusters. However, it is less well suited to manage and scale large and complex clusters. As HPE Aruba Networking’s workload grew, Docker Swarm introduced the following challenges:
- Self-managing Docker Swarm on Amazon Elastic Compute Cloud (Amazon EC2) required a non-trivial amount of time and effort to set up and maintain master nodes, which increased the overall cost of maintaining the cluster.
- Docker Swarm frequently suffered from quorum loss, particularly when operating with fewer than three nodes in the quorum. This instability disrupted production deployments, which compels teams to frequently reinitialize the cluster to restore functionality.
- System upgrades proved to be a challenging task, which occasionally required the cluster to be taken out of service. Attempts to upgrade the development cluster resulted in a complete loss of the cluster, necessitating a full setup from scratch.
- Overlay Network and node port configurations added complexity and made it difficult to scale within a single cluster.
- The process of recovering from a disaster was complex and time-consuming, requiring numerous manual steps to restore the cluster while maintaining consistency across clusters.
New Container Platform Evaluation
To reduce operational costs and meet the technical requirements outlined above, HPE Aruba Networking evaluated alternative container orchestration platforms to run container workloads. After considering various options, HPE Aruba Networking chose Kubernetes, a powerful and flexible open-source container orchestration platform widely used by organizations of all sizes, including other teams within HPE. Kubernetes is an active open-source project that is regularly updated and improved by the community. This gave the HPE Aruba Networking team confidence that Kubernetes would continue to meet its needs in the future.
New Container Platform Architecture
To deploy the new container orchestration platform, HPE Aruba Networking selected Amazon EKS, AWS’ managed Kubernetes service that simplifies operations and management. Below in Figure 1 is a high-level overview of the solution architecture.
Figure 1: Amazon EKS platform architecture
- HPE Aruba Networking has a dedicated Amazon Virtual Private Cloud (Amazon VPC) for every Amazon EKS cluster and each cluster has multiple worker nodes in three different Availability Zones (AZs).
- CNAME record for the Application Load Balancer (ALB) is registered in Amazon Route 53 to access the service from end-users and is then routed to the corresponding target by each tenant’s unique listener rule.
- Finally, the traffic goes to Amazon Relational Database Service (Amazon RDS) for MySQL database to store and query persistent data.
How Amazon EKS Solved the Problem
Amazon EKS provides managed, in-place control plane upgrades, which ensures the cluster won’t enter into a non-deterministic or unrecoverable state as it automatically reverts back to the prior version if any of readiness check fails during the control plane upgrade.
Amazon EKS add-ons simplify the management and upgrades of a curated set of related plug-ins such as kube-proxy, CoreDNS, and Amazon VPC Container Network Interface (CNI) with the latest security patches and bug fixes, all validated by AWS to run on Amazon EKS. The upgrade process for Amazon EKS is well-documented in the Amazon EKS best practices guides, which helped HPE Aruba Networking plan, test, implement upgrades smoothly.
Simplified Networking
AWS VPC CNI plugin (aws-vpc-cni-k8s) simplified HPE Aruba Networking’s network configuration by eliminating the need for a complex overlay network. To implement layer 4 network policies and restrict unnecessary communication, HPE Aruba Networking adopted Cilium in chaining mode in conjunction with the AWS VPC CNI plugin.
With ALB, HPE Aruba Networking could easily scale its services through the use of ALBs. ALBs support target groups, which route requests to multiple registered targets without manual node port configuration. By using an IP target type from the ALB, HPE Aruba Networking could send traffic directly to the Kubernetes pods behind the service, eliminating the need for an extra network hop through the worker nodes in the Kubernetes cluster.
Security
Amazon EKS is integrated with many AWS services, such as AWS Identity and Access Management (AWS IAM) for authentication, Amazon VPC for network isolation, Amazon CloudWatch for control plane logging, and Amazon GuardDuty for threat detection. Specifically, HPE Aruba Networking took the following steps to secure their Amazon EKS deployment:
- Implemented AWS IAM roles for service accounts (IRSA) to provide fine-grained permissions and credential isolation to eliminate the need to manually manage and rotate AWS credentials.
- Deployed a separate Amazon EKS cluster in each VPC with subnets spread across three AZs in order to isolate cellular failure and minimize the impact of outages.
- Used Amazon CloudWatch to easily access Kubernetes audit logs out-of-the-box, separately from the Amazon EKS clusters.
- Applied AWS Web Application Firewall (WAF) and AWS Shield Advanced to enable automatic application layer distributed denial-of-service (DDoS) mitigation for ALB resources.
- Enabled automatic application layer DDoS mitigation for ALB resources, in addition to CloudFront distributions, protected by AWS Shield Advanced.
Open-source Plugins
Amazon EKS runs upstream conformant open-source Kubernetes, so HPE Aruba Networking could fully use open-source tooling from the Kubernetes community to further achieve their requirements such as disaster recovery, network isolation, and cost visibility. The extension of the Amazon EKS add-ons to include AWS Marketplace for Containers helped them easily find required third-party Kubernetes software with a compatible version from the Amazon EKS console and deploy it to Amazon EKS clusters seamlessly.
- Implemented Cilium, an open-source technology that provides Kubernetes Network Policies to achieve a L3/L4/L7 layer network isolation between the parent orchestration service and child services.
- Implemented Velero, an open-source solution to regularly back up clusters and achieve disaster recovery requirements within their recovery time objective (RTO) and recovery point objective (RPO). This was one of the high priority requirements to achieve Service Organization Control (SOC) 2 Type 2 compliance.
- Implemented Kubecost and Amazon Managed Service for Prometheus to provide better cost visibility across the Amazon EKS clusters, which helps the team manage and reduce Kubernetes spend.
Saving Operational Costs and Efforts
Amazon EKS automatically manages the availability and scalability of the Kubernetes control plane running across multiple AZs for high availability and fault tolerance with 99.95% SLA. Amazon EKS’s scalable fault tolerance reduces the operational burden that HPE Aruba Networking previously dealt with in provisioning, scaling, and upgrading multiple control planes across clusters.
Because Amazon EKS automatically detects and replaces unhealthy control plane nodes, HPE Aruba Networking doesn’t have to worry about losing a control plane and manually recovering it. The control plane is also scaled automatically based on various metrics as detailed in Amazon EKS improves control plane scaling and update speed by up to 4x post, to match the traffic demand from the pods on worker nodes in 10 minutes or less.
Achieved Outcomes
In August 2023, the HPE Aruba Networking team launched its first production Amazon EKS cluster, achieving the following major outcomes:
- Reduced operational overhead and increased focus on core business. Amazon EKS’s managed services freed up the team to focus on developing the core platform of HPE Aruba Networking’s SD-WAN solution. Their DevOps team reported that they saved 30% on maintenance effort with the new architecture in place. They also said that “Upgrading EKS cluster and its add-ons is as easy as clicking a button.”
- Improved disaster recovery and security compliance. Velero helped HPE Aruba Networking meet its disaster recovery requirements and maintain SOC 2 Type 2 compliance.
- Increased scalability and reduced manual intervention. Amazon EKS scalability features allowed HPE Aruba Networking to scale its clusters to hundreds of nodes without manual intervention.
- Access to AWS 24×7 enterprise support. Amazon EKS provided HPE Aruba Networking with access to AWS 24×7 enterprise support team and subject matters experts, who can help resolve issues quickly and efficiently.
Future Improvements
HPE Aruba Networking is expanding its use of Amazon EKS clusters to multiple regions, and is evaluating the following improvements to make it easier, more cost efficient, and more secure to run EdgeConnect Cloud Orchestrator on AWS:
- Develop multi-architecture container images that can run on both x86 and AWS Graviton processors to improve price performance and efficiency.
- Using Karpenter to automate the scaling of Amazon EKS clusters to optimize costs and ensure sufficient resources for their workloads.
- Using VPC CNI and Kubernetes role-based access control (RBAC) to improve the security and isolation of Amazon EKS workloads with VPC CNI, and further protect workloads from unauthorized access using fine-grained RBAC.
- Integrate Kubecost with AWS Cost and Usage Report and enable Single Sign-On to improve cost visibility and harden security in multi-cluster environments.
Conclusion
In this post, we discussed how HPE Aruba Networking modernized its container platform with Amazon EKS, and provided a testament to the successful collaboration between HPE and AWS. With the support of HPE Aruba Networking leadership, we helped to transform its container platform into a modern, scalable, and secure solution. By automating many of the tasks involved in managing Kubernetes clusters, Amazon EKS has freed up HPE Aruba Networking’s engineering team to focus on application improvements and enabled EdgeConnect Cloud Orchestrator to support more complex and advanced capabilities.
Looking ahead, HPE Aruba Networking plans to expand its use of Amazon EKS to deliver EdgeConnect Cloud Orchestrator to businesses worldwide. If you are looking for a modern, scalable, and secure container platform that can simplify your operations, then Amazon EKS is a great option. To learn more about how Amazon EKS can help you transform your business, visit the Amazon EKS webpage.