AWS Networking Lessons: Transit Gateways, Shared VPCs, and the IRSA Gotcha
I picked up three AWS networking lessons at work recently that I wish someone had just told me upfront. None of these are deep secrets, but they’re the kind of thing that doesn’t click until you run into them in a real environment. So here they are.
1. Transit Gateways: Stop Peering Everything
When you have a handful of VPCs that need to talk to each other, VPC peering works fine. It’s a direct, one-to-one connection. Simple.
The problem is it doesn’t scale. VPC peering is non-transitive — if VPC A peers with VPC B, and VPC B peers with VPC C, A and C can’t talk to each other through B. You need a separate peering connection for every pair. The math gets ugly fast.
VPC Peering (Mesh)
VPC-A
/ \
/ \
VPC-B --- VPC-C
\ /
\ /
VPC-D
Connections needed: n * (n-1) / 2
4 VPCs = 6 connections
10 VPCs = 45 connections
Every new VPC means peering with every existing one. Route tables multiply. It becomes a mess.
Transit Gateway (Hub-and-Spoke)
VPC-A
|
VPC-B -- TGW -- VPC-C
|
VPC-D
Connections needed: n
4 VPCs = 4 connections
10 VPCs = 10 connections
A Transit Gateway acts as a central hub. Each VPC connects to the TGW once, and the TGW handles routing between all of them. Add a new VPC? One attachment, done. You can also connect on-prem networks, other regions, even other accounts through the same hub.
When to Use What
| VPC Peering | Transit Gateway | |
|---|---|---|
| Cost | Free data transfer (same AZ) | Per-attachment + per-GB fees |
| Scale | Gets painful past 5-10 VPCs | Handles thousands |
| Routing | Non-transitive, manual | Centralized route tables |
| Cross-account | Yes, but each peer is manual | Yes, with simpler management |
| Use case | 2-3 VPCs, simple setup | Multi-VPC, multi-account, hybrid |
The takeaway: if you’re in a multi-account setup or expect to grow past a few VPCs, go Transit Gateway from the start. Retrofitting it later is not fun.
2. Shared VPCs via AWS RAM
This one was a “wait, that exists?” moment for me. AWS Resource Access Manager (RAM) lets you share VPC subnets across AWS accounts. That means multiple accounts can launch resources into the same VPC without peering, without Transit Gateways — they just use the same network.
The owning account creates the VPC and shares specific subnets through RAM. Participating accounts can then deploy EC2 instances, RDS databases, Lambda functions, etc., directly into those shared subnets.
Account A (Owner)
+--------------------------+
| VPC 10.0.0.0/16 |
| +----------+ +---------+|
| |Subnet-1 | |Subnet-2 ||
| |(shared) | |(shared) ||
| +----+-----+ +----+----+|
+-------|-----------+|-----+
| |
+-----+ +-----+
v v
Account B Account C
(deploys here) (deploys here)
You can find shared resources in the AWS console by searching for RAM (Resource Access Manager). The shared subnets just show up in the participating account’s VPC console as if they were local.
Why this matters: it drastically simplifies networking in multi-account setups. Instead of each account having its own VPC and wiring them all together, you share one well-designed network. Less routing, less NAT gateways, less cost.
3. The IRSA/OIDC Gotcha That Kills Canary Deployments
This is the one that surprised me the most, and honestly the reason I’m writing this post. If you’re running EKS clusters managed by Terraform and you want to do canary deployments (spinning up a new cluster alongside the old one, gradually shifting traffic), you’re going to hit a wall with IRSA.
What’s IRSA?
IAM Roles for Service Accounts (IRSA) lets Kubernetes pods assume AWS IAM roles. Under the hood, it works through an OIDC (OpenID Connect) provider that’s unique to each EKS cluster. When you create an EKS cluster, AWS generates a unique OIDC issuer URL for it.
The Problem
Here’s where it breaks down for canary deployments:
Cluster A (current)
OIDC: oidc.eks.us-east-1.amazonaws.com/id/AAAAA111
+-- IAM Role trusts THIS specific OIDC provider
+-- Pods authenticate through THIS provider
Cluster B (canary)
OIDC: oidc.eks.us-east-1.amazonaws.com/id/BBBBB222
+-- DIFFERENT OIDC provider
+-- IAM Roles from Cluster A DON'T WORK HERE
Each cluster gets its own OIDC provider ID. The IAM role trust policies reference a specific OIDC provider. So when you spin up Cluster B as a canary, its pods can’t assume the same IAM roles that Cluster A’s pods use — because the trust relationship points to Cluster A’s OIDC provider, not Cluster B’s.
In Terraform terms, the dependency chain looks like this:
EKS Cluster --> OIDC Provider --> IAM Role Trust Policy --> K8s Service Account
Every piece is coupled to the specific cluster. You can’t just create a second cluster and expect workloads to seamlessly use the same IAM permissions. You’d need to:
- Create a new OIDC provider for the canary cluster
- Update every IAM role trust policy to also trust the new provider
- Or duplicate all the IAM roles for the canary cluster
None of those options are clean. Option 2 means modifying production IAM roles during a canary deployment, which defeats the purpose of a safe, isolated rollout. Option 3 means managing duplicate roles and keeping them in sync.
What This Looks Like in Terraform
To make it concrete, here’s the kind of trust policy that IRSA creates:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/AAAAA111"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-east-1.amazonaws.com/id/AAAAA111:sub": "system:serviceaccount:default:my-app"
}
}
}
]
}
See that AAAAA111? That’s hardcoded to Cluster A. Cluster B gets BBBBB222. Your canary cluster’s pods will get AccessDenied when they try to assume this role. And in Terraform, both the OIDC provider and the trust policy are resources that depend on the cluster — so you can’t decouple them without significant refactoring.
The Fix: EKS Pod Identity
AWS acknowledged this limitation and released EKS Pod Identity as the replacement for IRSA. Pod Identity uses a cluster-level agent instead of per-cluster OIDC providers, removing the tight coupling that breaks canary patterns.
| IRSA | EKS Pod Identity | |
|---|---|---|
| OIDC per cluster | Yes (unique provider) | No |
| IAM role coupling | Tied to specific cluster | Cluster-independent |
| Canary-friendly | No | Yes |
| Terraform complexity | High (OIDC + trust policies) | Lower (Pod Identity associations) |
| Cross-cluster portability | Manual IAM updates | Works naturally |
If you’re starting fresh, use EKS Pod Identity instead of IRSA. If you’re already on IRSA and need canary deployments, migrating to Pod Identity is worth the effort. It’s not a trivial migration — you’ll need to update your Terraform modules and redeploy workloads — but the operational simplicity on the other side is significant.
Wrapping Up
These three lessons boil down to the same theme: AWS networking decisions made early have long-lasting consequences. Transit Gateways save you from a peering nightmare. Shared VPCs via RAM can eliminate unnecessary network complexity. And if you’re on EKS with Terraform, understand the IRSA/OIDC coupling before you commit to a deployment strategy — or better yet, start with EKS Pod Identity.
None of this was obvious to me before I saw it in a real environment. Hopefully it saves you some time.