AWS Networking Lessons: Transit Gateways, Shared VPCs, and the IRSA Gotcha

I picked up three AWS networking lessons at work recently that I wish someone had just told me upfront. None of these are deep secrets, but they’re the kind of thing that doesn’t click until you run into them in a real environment. So here they are.

1. Transit Gateways: Stop Peering Everything

When you have a handful of VPCs that need to talk to each other, VPC peering works fine. It’s a direct, one-to-one connection. Simple.

The problem is it doesn’t scale. VPC peering is non-transitive — if VPC A peers with VPC B, and VPC B peers with VPC C, A and C can’t talk to each other through B. You need a separate peering connection for every pair. The math gets ugly fast.

VPC Peering (Mesh)

     VPC-A
    /     \
   /       \
VPC-B --- VPC-C
   \       /
    \     /
     VPC-D

Connections needed: n * (n-1) / 2
4 VPCs = 6 connections
10 VPCs = 45 connections

Every new VPC means peering with every existing one. Route tables multiply. It becomes a mess.

Transit Gateway (Hub-and-Spoke)

        VPC-A
          |
VPC-B -- TGW -- VPC-C
          |
        VPC-D

Connections needed: n
4 VPCs = 4 connections
10 VPCs = 10 connections

A Transit Gateway acts as a central hub. Each VPC connects to the TGW once, and the TGW handles routing between all of them. Add a new VPC? One attachment, done. You can also connect on-prem networks, other regions, even other accounts through the same hub.

When to Use What

	VPC Peering	Transit Gateway
Cost	Free data transfer (same AZ)	Per-attachment + per-GB fees
Scale	Gets painful past 5-10 VPCs	Handles thousands
Routing	Non-transitive, manual	Centralized route tables
Cross-account	Yes, but each peer is manual	Yes, with simpler management
Use case	2-3 VPCs, simple setup	Multi-VPC, multi-account, hybrid

The takeaway: if you’re in a multi-account setup or expect to grow past a few VPCs, go Transit Gateway from the start. Retrofitting it later is not fun.

2. Shared VPCs via AWS RAM

This one was a “wait, that exists?” moment for me. AWS Resource Access Manager (RAM) lets you share VPC subnets across AWS accounts. That means multiple accounts can launch resources into the same VPC without peering, without Transit Gateways — they just use the same network.

The owning account creates the VPC and shares specific subnets through RAM. Participating accounts can then deploy EC2 instances, RDS databases, Lambda functions, etc., directly into those shared subnets.

  Account A (Owner)
  +--------------------------+
  |  VPC 10.0.0.0/16         |
  |  +----------+ +---------+|
  |  |Subnet-1  | |Subnet-2 ||
  |  |(shared)  | |(shared) ||
  |  +----+-----+ +----+----+|
  +-------|-----------+|-----+
          |            |
    +-----+            +-----+
    v                        v
Account B              Account C
(deploys here)         (deploys here)

You can find shared resources in the AWS console by searching for RAM (Resource Access Manager). The shared subnets just show up in the participating account’s VPC console as if they were local.

Why this matters: it drastically simplifies networking in multi-account setups. Instead of each account having its own VPC and wiring them all together, you share one well-designed network. Less routing, less NAT gateways, less cost.

3. The IRSA/OIDC Gotcha That Kills Canary Deployments

This is the one that surprised me the most, and honestly the reason I’m writing this post. If you’re running EKS clusters managed by Terraform and you want to do canary deployments (spinning up a new cluster alongside the old one, gradually shifting traffic), you’re going to hit a wall with IRSA.

What’s IRSA?

IAM Roles for Service Accounts (IRSA) lets Kubernetes pods assume AWS IAM roles. Under the hood, it works through an OIDC (OpenID Connect) provider that’s unique to each EKS cluster. When you create an EKS cluster, AWS generates a unique OIDC issuer URL for it.

The Problem

Here’s where it breaks down for canary deployments:

Cluster A (current)
  OIDC: oidc.eks.us-east-1.amazonaws.com/id/AAAAA111
  +-- IAM Role trusts THIS specific OIDC provider
  +-- Pods authenticate through THIS provider

Cluster B (canary)
  OIDC: oidc.eks.us-east-1.amazonaws.com/id/BBBBB222
  +-- DIFFERENT OIDC provider
  +-- IAM Roles from Cluster A DON'T WORK HERE

Each cluster gets its own OIDC provider ID. The IAM role trust policies reference a specific OIDC provider. So when you spin up Cluster B as a canary, its pods can’t assume the same IAM roles that Cluster A’s pods use — because the trust relationship points to Cluster A’s OIDC provider, not Cluster B’s.

In Terraform terms, the dependency chain looks like this:

EKS Cluster --> OIDC Provider --> IAM Role Trust Policy --> K8s Service Account

Every piece is coupled to the specific cluster. You can’t just create a second cluster and expect workloads to seamlessly use the same IAM permissions. You’d need to:

Create a new OIDC provider for the canary cluster
Update every IAM role trust policy to also trust the new provider
Or duplicate all the IAM roles for the canary cluster

None of those options are clean. Option 2 means modifying production IAM roles during a canary deployment, which defeats the purpose of a safe, isolated rollout. Option 3 means managing duplicate roles and keeping them in sync.

What This Looks Like in Terraform

To make it concrete, here’s the kind of trust policy that IRSA creates:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/AAAAA111"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/AAAAA111:sub": "system:serviceaccount:default:my-app"
        }
      }
    }
  ]
}

See that AAAAA111? That’s hardcoded to Cluster A. Cluster B gets BBBBB222. Your canary cluster’s pods will get AccessDenied when they try to assume this role. And in Terraform, both the OIDC provider and the trust policy are resources that depend on the cluster — so you can’t decouple them without significant refactoring.

The Fix: EKS Pod Identity

AWS acknowledged this limitation and released EKS Pod Identity as the replacement for IRSA. Pod Identity uses a cluster-level agent instead of per-cluster OIDC providers, removing the tight coupling that breaks canary patterns.

	IRSA	EKS Pod Identity
OIDC per cluster	Yes (unique provider)	No
IAM role coupling	Tied to specific cluster	Cluster-independent
Canary-friendly	No	Yes
Terraform complexity	High (OIDC + trust policies)	Lower (Pod Identity associations)
Cross-cluster portability	Manual IAM updates	Works naturally

If you’re starting fresh, use EKS Pod Identity instead of IRSA. If you’re already on IRSA and need canary deployments, migrating to Pod Identity is worth the effort. It’s not a trivial migration — you’ll need to update your Terraform modules and redeploy workloads — but the operational simplicity on the other side is significant.

Wrapping Up

These three lessons boil down to the same theme: AWS networking decisions made early have long-lasting consequences. Transit Gateways save you from a peering nightmare. Shared VPCs via RAM can eliminate unnecessary network complexity. And if you’re on EKS with Terraform, understand the IRSA/OIDC coupling before you commit to a deployment strategy — or better yet, start with EKS Pod Identity.

None of this was obvious to me before I saw it in a real environment. Hopefully it saves you some time.