Transforming Istio into an enterprise-ready service mesh for Amazon ECS

3 weeks ago 11
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

This post is authored by John Howard(Senior Architect, Solo.io), Petr McAllister(Engineer on the Partner Team, Solo.io), Christian Posta(VP, Global Field CTO, Solo.io) and Jooyoung Kim (Senior Containers Specialist Solutions Architect, AWS).

Introduction

Amazon Elastic Container Service (Amazon ECS) is a fully managed service that streamlines the deployment, management, and scaling of containerized applications. Although Amazon ECS has deep integration with other AWS services, it also offers compatibility with open source projects, particularly those under the Cloud Native Computing Foundation (CNCF). The Amazon ECS support for CNCF projects allows enterprises to use the flexibility of open source technologies alongside the powerful native AWS services, making it a versatile choice for organizations looking to balance ease of use with open standards and extensibility.

Istio is recognized as the most widely deployed, mature, and stable service mesh available today. It provides an effortless solution for implementing mutual TLS (mTLS), enhancing observability, achieving resilience, and facilitating effective HTTP routing and traffic control for applications and APIs.

Solo.io, the creator and leader of the Istio project, is driving innovation in this domain with the introduction of a ground-breaking sidecarless model known as “Ambient Mesh.” This new approach aims to lower the barrier to adoption and cost of powerful service-mesh features. It also streamlines the expansion of Istio capabilities to compute environments beyond Kubernetes. With Ambient Mesh, Amazon ECS workloads can integrate into the mesh, offering full support for workloads running on both AWS Fargate and Amazon Elastic Compute Cloud (Amazon EC2) on Amazon ECS.

In this post, we explore how Ambient Mesh can be integrated with Amazon ECS, particularly with AWS Fargate. We delve into how this innovative approach streamlines service mesh adoption on Amazon ECS.

Ambient Mesh

Historically, Istio has adopted a “Kubernetes-first” approach, treating other environments as secondary. Although the goal was to unify heterogeneous compute platforms, this vision remains unfulfilled. Furthermore, Istio primarily uses Envoy proxy as its sidecar. However, running a powerful Layer 7 proxy presents challenges related to bootstrap, configuration, and CPU management, with increased operational burden at scale.

Istio’s ambient mode, also known as Ambient Mesh, is another data plane option that significantly streamlines this complexity, allowing users to onboard their Amazon ECS services into the mesh with greater ease.

In Istio ambient mode, Istio addresses these challenges by separating the all-in-one data plane into two distinct agents: ztunnel (Zero Trust Tunnel) and waypoint proxy, as shown in the following figure.

Figure 1. Istio ambient mode

Figure 1. Istio ambient mode

Ztunnel is a lightweight, Rust-based agent that establishes mTLS connections on behalf of an application to other services by default. The ztunnel uses SPIFFE-based X.509 certificates to authenticate workloads for mTLS connections, allowing for stable workload identity to be used for policy enforcement. A ztunnel agent can run as a daemon agent on a host or as a simple sidecar, depending on environmental constraints.

The waypoint proxy is an L7 proxy like Envoy, enabling powerful L7 policy control and enforcement. It runs “in the network” and not as a sidecar, which allows for independent scaling and tuning for more fine-grained tenancy and blast radius. For example, an SRE team might use two waypoint proxies to handle traffic for 10 replicasets of a specific Amazon ECS service, as compared to 10 sidecar proxies in sidecar mode. Furthermore, a waypoint proxy can manage traffic for multiple Amazon ECS services together with a single set of waypoint proxies. Despite adding another hop, the waypoint proxy is still faster than sidecar mode, because L7 processing occurs only once on the waypoint instead of on both the client’s and server’s sidecars. Moreover, this architecture is more flexible and resource-efficient than the traditional sidecar model. However, not all use cases necessitate a waypoint proxy. If your primary requirement is mTLS between Amazon ECS services, then you can bypass the waypoint entirely, which leads to faster traffic flow.

We can seamlessly deploy Istio Ambient across various AWS computing resources, including Amazon ECS, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon EC2 and even AWS Lambda, offering flexible service-mesh solutions for different environments. That being said, for users with hybrid container environments, Istio ambient mode can streamline service-to-service networking.

Benefits of Ambient Mesh

Istio ambient mode enhances Amazon ECS services with robust security, traffic control, resiliency, and observability. From a security perspective, it offers a direct solution for the environment necessitating mTLS compliance through the ztunnel agent, which operates within an Amazon ECS task or as a daemon. Istio transparently manages certificate issuance, rotation, and integration with options such as Amazon Private Certificate Authority (Amazon Private CA). Using SPIFFE workload identity allows Istio to enable powerful service authorization policies, controlling access based on request attributes such as headers, HTTP methods, paths, and even payload inspection.

Furthermore, Istio provides a robust set of traffic control features commonly used for canary releases and blue/green deployments. In Amazon ECS, these capabilities include request routing based on various HTTP attributes, and traffic splitting by percentage. Istio also supports more advanced scenarios, such as regex matching, direct response actions, redirection, and HTTP rewriting.

Moreover, Istio offers powerful failover and resilience mechanisms, including retires, timeouts, outlier detection, and health checks. For example, you can configure request retry budgets and backoffs to prevent retry storms that could disrupt your services. Customizing these settings for your architecture is essential for maintaining uptime. Features such as circuit breaking and outlier detection should be tailored for each team, service, or cluster while considering zonal and regional locality to avoid unnecessary egress costs. Istio’s functionality is highly configurable using a declarative YAML format, which can integrate into GitOps workflows.

Lastly, one of the most powerful features of a service mesh is its observability and plug-in capability for various backend telemetry collection systems. Solo.io’s Istio Ambient Mesh delivers robust L7 telemetry through the ztunnel component. Built-in OpenTelemetry support allows Istio Ambient to integrate with open source tools such as Prometheus, and popular third-party observability platforms such as Datadog and Splunk. What’s more, enabling distributed tracing with Istio Ambient provides detailed insights into service path traversal and latency.

Integrating Istio with Amazon ECS

Running Istio on Amazon ECS and other environments is streamlined, with solo.io enabling organizations to extend Istio for both fully Kubernetes-independent deployments and seamless integration with various workloads.

In Amazon ECS, Istio ambient mode is compatible with both Amazon EC2 and AWS Fargate. In Amazon EC2, the ztunnel component operates as a daemon on the EC2 container instance, which allows Amazon ECS tasks to seamlessly integrate with the mesh without awareness of its presence. This integration allows organizations to use Istio’s capabilities, including traffic management and encryption through ztunnel, while also using monitoring and access control features through waypoint, all without needing to modify Amazon ECS services or task configurations, as shown in the following figure.

Figure 2. Architecture of Istio ambient mode with Amazon ECS and Amazon EC2

Figure 2. Architecture of Istio ambient mode with Amazon ECS and Amazon EC2

In AWS Fargate, the ztunnel is integrated as a sidecar within the Amazon ECS tasks, connecting the service to the mesh, as shown in the following figure. This lightweight ztunnel agent streamlines usage as compared to a full Envoy Proxy. Previously, third-party service meshes faced challenges due to the elevated privileges needed for sidecars, but this implementation resolves that. Additionally, ztunnel is resource-efficient, needing only 5~10 MiB of memory and 1~5 vCPU unites, which is less than Istio’s Envoy-based sidecar, allowing for smoother deployment in your Fargate environment.

Figure 3. Architecture of Istio ambient mode with Amazon ECS and AWS Fargate

Figure 3. Architecture of Istio ambient mode with Amazon ECS and AWS Fargate

A service mesh provides valuable Layer 7 capabilities for application traffic. The ambient architecture allows you to deploy L7 waypoints in a way that suits your needs, whether for individual applications, groups of services, or entire ECS clusters, as shown in the following figure. You also have full control over where the waypoint runs, with options to deploy it as an Amazon ECS task, on a virtual machine (VM) in Amazon EC2, as a pod on Amazon EKS, or as a managed service. Istio ambient mode enables you to optimize your setup according to your application’s configuration, infrastructure requirements, and specific use cases, thus ensuring you aren’t limited to a single deployment model.

Figure 4. Architecture of Istio ambient mode in Amazon ECS and AWS Fargate with waypoint

Figure 4. Architecture of Istio ambient mode in Amazon ECS and AWS Fargate with waypoint

Walkthrough

In the following section, we explore how to set up Ambient Mesh on Amazon ECS with AWS Fargate, which involves adding Istio capabilities to an Amazon ECS task. For more detailed guidance, refer to this repository.

Prerequisites

For this walkthrough, you need the following prerequisites:

  • istioctl CLI (solo.io version, license needed).
  • An Istio control plane (self-deployed or managed through solo.io).

The following steps walk you through this solution.

1) Generate a bootstrap token for the Amazon ECS service that automatically detects necessary configurations and workload attestations, ensuring SPIFFE compatibility for connecting the Amazon ECS task to the mesh:

istioctl bootstrap --service-account ${ECS_SERVICE_ACCOUNT_NAME} --platform=ecs

2) Add a ztunnel container to the Amazon ECS task definition, attaching the token from Step 1 as an environment variable. The following is an example of an Amazon ECS task definition that enables the service to join the service mesh:

… "containerDefinitions": [ { "name": "shell", "image": "curlimages/curl:latest", "memory": 512, "cpu": 256, "command": ["sleep infinity"], "entryPoint": ["sh", "-c"], }, { "name": "ztunnel", "image": "${ZTUNNEL_IMAGE}", "environment": [ { "name": "BOOTSTRAP_TOKEN", "value": "${BOOTSTRAP_TOKEN}" } ], "logConfiguration": { "logDriver": "awslogs", "options": { "awslogs-group": "${AWSLOGS_GROUP.NAME}", "awslogs-region": "${AWS_REGION}", "awslogs-stream-prefix": "ztunnel" } } } ], tags = { "ecs.solo.io/service-account" = var.ecs_service_account_name } …

Important note:

  • The Istio Ambient ztunnel image connects the workload to the service mesh and automatically handles redirection, thus eliminating the need for manual setup.
  • The tags value specifies the authorized service account for connecting to the Istio control plane, providing an additional security layer against unauthorized access to the service mesh.

3) Use the preceding definition to create an Amazon ECS task with your preferred tools, such as the AWS CLI or Terraform. The following is an example of creating an Amazon ECS service using the Amazon ECS task definition:

aws ecs create-service --cluster ${ECS_CLUSTER_NAME} \ --service-name ${ECS_SERVICE_NAME}\ --task-definition ${TASK_DEFINITION} \ --desired-count 1 \ --launch-type FARGATE \ --tags key=ecs.solo.io/service-account,value=$ECS_SERVICE_ACCOUNT_NAME \ --network-configuration 'awsvpcConfiguration={subnets=['${PRIVATE_SUBNETS}'],securityGroups=['${SG_ID}']}'

4) When the task is part of the service mesh, it can be accessed from other services using the <service_name>.<namespace> format. Istiod uses the Amazon ECS API to query and register running services and tasks in its internal registry. The following is an example command to log into an Amazon ECS task and call a container named “echo” in the ECS namespaces from a container named “shell”:

aws ecs execute-command \ --cluster "${ECS_CLUSTER_NAME}" \ --task "${TASK_ID}" \ --container "shell" \ --interactive \ --command "curl echo.ecs.local:8080"

5) As a result, you can expect the output to show a 200 status code from the HTTP call.

With this setup, the Amazon ECS task can proxy traffic through the lightweight ztunnel proxy, enabling features such as automatic mutual TLS between mesh-enabled workloads, service discovery, configurable retries, timeouts, circuit breaking, observability, and cross-platform connectivity. This allows communication between workloads on different platforms, such as Amazon EKS or cloud environments, without complex networking configurations.

By using this configuration, the demo shows how Amazon ECS tasks running on AWS Fargate can communicate within a service mesh and extend connectivity to services in Amazon EKS. This demonstrates the flexibility of creating a service mesh across both Amazon ECS and Amazon EKS environments.

Conclusion

Istio has established itself as a mature, industry-standard solution, offering capabilities such as traffic control, load balancing, health monitoring, encryption, and endpoint identity through mTLS. With the introduction of Istio ambient mode, it minimizes overhead and provides flexibility in deployment without relying on Kubernetes. This enables Amazon ECS users to concentrate on their application logic rather than managing complex network configurations. As the demand for robust networking features grows in microservices and container deployments on Amazon ECS, Istio integrates seamlessly to deliver secure mTLS communication, identity-based authorization policies, and traffic management controls such as blue/green and canary deployments, along with configurable retries, timeouts, and circuit breaking for enhanced resilience. If you’re interested in learning more about Istio Ambient Mesh, feel free to reach out for further information.

Read Entire Article