One of Kubernetes’ big selling points is that each Pod has its own network address. This makes the Pod behave a bit like a VM, and frees developers from worrying about pesky things like port conflicts. It’s a property of Kubernetes that makes things easier for developers and operators, and has been credited as one of the design features that made it so popular as a container orchestrator. Google Kubernetes Engine (GKE) additionally adopts a flat network structure for all clusters in a VPC, which means that each Pod in each cluster has its own IP in the VPC and can communicate with Pods in other clusters directly (without needing NAT), a useful property which enables advanced features like container-native load balancing.
While this addressing layout has many advantages, the trade-off is that you can consume IPs rather quickly. With every Pod in every cluster being allocated IPs on the same VPC, and allowing space in those ranges for expansion, IPs get used very fast. IPv6 has long been proposed as the industry-wide solution to all these problems, and one day that will no doubt be true, but GKE doesn’t support single-stack IPv6 for Pod addressing, and not everyone is ready to drop IPv4 in any case, so how do you solve this problem with IPv4 ranges today?
In my travels as a product manager on GKE, the best solution I have seen is to use a non-RFC1918 IP range for Pods. While there are alternative approaches to solving this problem, what follows is the specific solution I have seen deployed successfully by multiple customers on GKE. Let’s take a closer look.
Have your 10.0.0.0/8 space utilization and eat it too
Most GKE clusters are created with the nodes in RFC1918 space, specifically 10.0.0.0/8. Did you know that you can still have all the benefits of a flat network structure like container-native addressing while preserving your 10.0.0.0/8 space utilization? The solution is to keep nodes in that CIDR range and to allocate just the Pod ranges (which use by far the most IPs) out of a larger non-RFC 1918 private address ranges like 100.64.0.0/10 or 240.0.0.0/4. Google Cloud VPC has native support for these other ranges, so within Google Cloud and between Pods in different clusters everything “just works.” You can for example connect to services like Cloud SQL, and between Pods in different clusters.
By keeping nodes in RFC 1918 space, IP masquerading can be used to mask the Pod’s address with that of the nodes, so that the rest of your off-VPC endpoints never need see a non-RFC 1918 address. Every endpoint outside of Google Cloud (or however you configure your masquerading rules) will see the 10.0.0.0/8 IP that it expects. Best of both worlds. The 100.64.0.0/10 range is reserved for shared use making it a great candidate to use first, giving you 4 million Pod addresses right off the bat, and with a potential quarter billion IPs for your Pods in 240.0.0.0/4, there’s plenty of room to grow beyond that.
It may not be immediately apparent that 240.0.0.0/4 is an acceptable range to use in your VPC for Kubernetes Pods. After all, on the public internet, this range has been reserved (since 1989 in fact) for future use, and could in theory be assigned one day. Private use of this range in your VPC doesn’t affect the public internet (no routes using this range will be advertised outside your VPC), but is there any downside? In the event this range was allocated one day, what it means is that hosts in your network wouldn’t be able to initiate outbound connections to hosts in that range. There’d be no impact to inbound connections that utilize load balancing (as most do). In other words, should that range ever be allocated, you can still serve customers on those addresses.
The other concern I’ve heard about using 240.0.0.0/4 ranges is that on-prem routers don’t support them, and neither do Windows hosts. There is a really simple solution to both those concerns, as you can easily configure IP masquerading for any destinations that don’t support it, meaning the only IP those services will see is from your 10.0.0.0/8 primary range.
Some Kubernetes platforms outside Google Cloud offer an “island mode” network design where you reuse the same Pod IP ranges in every cluster, and I’ve heard requests for this in GKE as well. The approach documented here is better in my view: you get the advantage of the flat network within the VPC (enabling things like container-native load balancing), while traffic can still be NATed over the Node’s IP when needed. By comparison, an “island mode” design will NAT all traffic that leaves the cluster (including Pod-to-Pod traffic between clusters), limiting what you can do inside the VPC.
Service IP ranges
So that’s Pods, but what about Services? Service ranges are another concern for IP allocation. GKE in Autopilot mode now automatically reuses the same /20 range for every cluster, giving you 4k services without allocating any of your network (service IPs are virtual and have no meaning outside the cluster, so there is no need to give them unique identifiers). On node-based GKE Standard mode, or if you need more than 4k services, you can create your own named subnet of whatever size you need (including out of 240.0.0.0/4 space), and reuse it for every cluster in the region as well (by passing it in the --services-secondary-range-name parameter when creating the cluster).
In summary, a recommended way to reduce IP usage while maintaining all the benefits of a flat network structure is to:
-
Allocate Node IP ranges from your main ranges (like 10.0.0.0/8)
-
Allocate Pod IP ranges from non-RFC 1918 space, like 100.64.0.0/10 and 240.0.0.0/4, while utilizing IP Masquerading to NAT with the Node’s IP for on-prem destinations or anywhere that expects a RFC 1918 range.
-
Use Autopilot mode, which automatically provides 4k IPs for your services, or create a named subnet for services and reuse it for all clusters by passing it during cluster creation
With these steps, I have seen several customers solve their IP constraints, and adopt this strategy as a bridge to one day running a cluster with single-stack IPv6 Pod addressing.
Next steps:
-
Learn about the supported IPv4 ranges by Cloud VPC
-
Learn about IP Masquerading in GKE
-
Try GKE’s Autopilot mode for a workload-based API that also improves operational efficiency