Episode 6 — Scaling Reliability Microservices Web3 / 6.3 — AWS Cloud Native Deployment
6.3.d — VPC Networking and IAM
In one sentence: A VPC (Virtual Private Cloud) is your isolated network on AWS where you control subnets, routing, and firewall rules, while IAM roles grant your ECS tasks the minimum permissions needed to call AWS services — together they form the security foundation of every cloud deployment.
Navigation: ← 6.3.c — Application Load Balancer · 6.3 Exercise Questions →
1. What Is a VPC and Why Does It Matter?
A Virtual Private Cloud (VPC) is a logically isolated section of the AWS cloud where you launch resources. Think of it as your own private data center inside AWS — you control:
- IP address ranges (CIDR blocks)
- Subnets (subdivisions of the network)
- Route tables (where traffic goes)
- Security groups (firewalls around resources)
- Network ACLs (firewalls around subnets)
- Internet/NAT gateways (connections to the outside world)
┌─────────────────────────────────────────────────────────────────────────┐
│ AWS REGION (us-east-1) │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ YOUR VPC (10.0.0.0/16) │ │
│ │ │ │
│ │ 65,536 IP addresses — your isolated network │ │
│ │ │ │
│ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │
│ │ │ Availability Zone A │ │ Availability Zone B │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ │
│ │ │ │Public Subnet │ │ │ │Public Subnet │ │ │ │
│ │ │ │10.0.1.0/24 │ │ │ │10.0.2.0/24 │ │ │ │
│ │ │ │ [ALB] │ │ │ │ [ALB] │ │ │ │
│ │ │ └─────────────────┘ │ │ └─────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ │
│ │ │ │Private Subnet │ │ │ │Private Subnet │ │ │ │
│ │ │ │10.0.10.0/24 │ │ │ │10.0.20.0/24 │ │ │ │
│ │ │ │ [ECS Tasks] │ │ │ │ [ECS Tasks] │ │ │ │
│ │ │ └─────────────────┘ │ │ └─────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ ┌─────────────────┐ │ │ ┌─────────────────┐ │ │ │
│ │ │ │Data Subnet │ │ │ │Data Subnet │ │ │ │
│ │ │ │10.0.100.0/24 │ │ │ │10.0.200.0/24 │ │ │ │
│ │ │ │ [RDS, Redis] │ │ │ │ [RDS, Redis] │ │ │ │
│ │ │ └─────────────────┘ │ │ └─────────────────┘ │ │ │
│ │ └─────────────────────┘ └─────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Why not use the default VPC?
AWS creates a default VPC in every region. While convenient for learning, production workloads should use a custom VPC because:
- Default VPC has only public subnets — everything is internet-accessible
- Default security groups are too permissive
- No private subnets for databases and application servers
- Harder to control CIDR ranges and avoid conflicts with other networks
2. Subnets: Public vs Private
A subnet is a range of IP addresses within your VPC. The critical distinction is between public and private subnets.
Public Subnets
- Have a route to an Internet Gateway (IGW)
- Resources can have public IP addresses
- Directly reachable from the internet
- Used for: ALB, NAT Gateway, bastion hosts
Private Subnets
- No direct route to the internet
- Resources only have private IP addresses
- Can reach the internet via a NAT Gateway (for outbound only — pulling images, calling APIs)
- Used for: ECS tasks, application servers, databases
Why put ECS tasks in private subnets?
PUBLIC subnet (BAD for tasks):
Internet → ECS Task ← Anyone can reach your container directly
Attack surface is huge
PRIVATE subnet (GOOD for tasks):
Internet → ALB (public) → ECS Task (private)
← Only the ALB can reach your containers
← Internet cannot directly access them
← Defense in depth
Creating subnets
# Create VPC
VPC_ID=$(aws ec2 create-vpc \
--cidr-block 10.0.0.0/16 \
--query 'Vpc.VpcId' \
--output text)
# Enable DNS hostnames (required for ECS service discovery)
aws ec2 modify-vpc-attribute --vpc-id "$VPC_ID" --enable-dns-hostnames
# Create public subnets (2 AZs for high availability)
PUBLIC_SUBNET_1=$(aws ec2 create-subnet \
--vpc-id "$VPC_ID" \
--cidr-block 10.0.1.0/24 \
--availability-zone us-east-1a \
--query 'Subnet.SubnetId' --output text)
PUBLIC_SUBNET_2=$(aws ec2 create-subnet \
--vpc-id "$VPC_ID" \
--cidr-block 10.0.2.0/24 \
--availability-zone us-east-1b \
--query 'Subnet.SubnetId' --output text)
# Create private subnets (for ECS tasks)
PRIVATE_SUBNET_1=$(aws ec2 create-subnet \
--vpc-id "$VPC_ID" \
--cidr-block 10.0.10.0/24 \
--availability-zone us-east-1a \
--query 'Subnet.SubnetId' --output text)
PRIVATE_SUBNET_2=$(aws ec2 create-subnet \
--vpc-id "$VPC_ID" \
--cidr-block 10.0.20.0/24 \
--availability-zone us-east-1b \
--query 'Subnet.SubnetId' --output text)
3. Internet Gateway and NAT Gateway
Internet Gateway (IGW)
An Internet Gateway allows resources in public subnets to communicate with the internet.
# Create and attach an Internet Gateway
IGW_ID=$(aws ec2 create-internet-gateway \
--query 'InternetGateway.InternetGatewayId' --output text)
aws ec2 attach-internet-gateway \
--internet-gateway-id "$IGW_ID" \
--vpc-id "$VPC_ID"
# Create a route table for public subnets
PUBLIC_RT=$(aws ec2 create-route-table \
--vpc-id "$VPC_ID" \
--query 'RouteTable.RouteTableId' --output text)
# Add route: all internet traffic goes to the IGW
aws ec2 create-route \
--route-table-id "$PUBLIC_RT" \
--destination-cidr-block 0.0.0.0/0 \
--gateway-id "$IGW_ID"
# Associate public subnets with this route table
aws ec2 associate-route-table --route-table-id "$PUBLIC_RT" --subnet-id "$PUBLIC_SUBNET_1"
aws ec2 associate-route-table --route-table-id "$PUBLIC_RT" --subnet-id "$PUBLIC_SUBNET_2"
NAT Gateway
A NAT Gateway lets resources in private subnets make outbound connections to the internet (e.g., pulling Docker images from ECR, calling external APIs) without being directly reachable from the internet.
Private Subnet Task → NAT Gateway (in public subnet) → Internet Gateway → Internet
Direction: OUTBOUND only — internet cannot initiate connections to private resources
# Allocate an Elastic IP for the NAT Gateway
EIP_ALLOC=$(aws ec2 allocate-address \
--domain vpc \
--query 'AllocationId' --output text)
# Create NAT Gateway in a public subnet
NAT_GW=$(aws ec2 create-nat-gateway \
--subnet-id "$PUBLIC_SUBNET_1" \
--allocation-id "$EIP_ALLOC" \
--query 'NatGateway.NatGatewayId' --output text)
# Wait for NAT Gateway to become available
aws ec2 wait nat-gateway-available --nat-gateway-ids "$NAT_GW"
# Create a route table for private subnets
PRIVATE_RT=$(aws ec2 create-route-table \
--vpc-id "$VPC_ID" \
--query 'RouteTable.RouteTableId' --output text)
# Route internet traffic from private subnets through the NAT Gateway
aws ec2 create-route \
--route-table-id "$PRIVATE_RT" \
--destination-cidr-block 0.0.0.0/0 \
--nat-gateway-id "$NAT_GW"
# Associate private subnets
aws ec2 associate-route-table --route-table-id "$PRIVATE_RT" --subnet-id "$PRIVATE_SUBNET_1"
aws ec2 associate-route-table --route-table-id "$PRIVATE_RT" --subnet-id "$PRIVATE_SUBNET_2"
NAT Gateway cost consideration
NAT Gateways cost $0.045/hour ($32/month) plus $0.045/GB of data processed. For cost optimization:
- Use VPC Endpoints for AWS services (ECR, S3, CloudWatch Logs) to avoid routing that traffic through NAT
- Use one NAT Gateway per AZ for high availability, or a single NAT Gateway for cost savings (less resilient)
4. Security Groups
A security group acts as a virtual firewall for your resources. Security groups are stateful — if you allow inbound traffic, the response is automatically allowed outbound.
Security group rules for a microservices architecture
┌─────────────────────────────────────────────────────────────────────┐
│ SECURITY GROUP CHAIN │
│ │
│ sg-alb (ALB Security Group) │
│ Inbound: │
│ Port 80 from 0.0.0.0/0 (HTTP redirect) │
│ Port 443 from 0.0.0.0/0 (HTTPS from internet) │
│ Outbound: │
│ Port 3000 to sg-ecs-tasks (to ECS tasks) │
│ │
│ sg-ecs-tasks (ECS Tasks Security Group) │
│ Inbound: │
│ Port 3000 from sg-alb (from ALB only) │
│ Outbound: │
│ Port 443 to 0.0.0.0/0 (external APIs, AWS services) │
│ Port 5432 to sg-database (PostgreSQL) │
│ Port 6379 to sg-cache (Redis) │
│ │
│ sg-database (RDS Security Group) │
│ Inbound: │
│ Port 5432 from sg-ecs-tasks (from ECS tasks only) │
│ Outbound: │
│ None (databases don't initiate connections) │
│ │
│ sg-cache (ElastiCache Security Group) │
│ Inbound: │
│ Port 6379 from sg-ecs-tasks (from ECS tasks only) │
│ Outbound: │
│ None │
└─────────────────────────────────────────────────────────────────────┘
Creating security groups
# ALB security group
SG_ALB=$(aws ec2 create-security-group \
--group-name sg-alb \
--description "ALB - allows HTTP/HTTPS from internet" \
--vpc-id "$VPC_ID" \
--query 'GroupId' --output text)
aws ec2 authorize-security-group-ingress --group-id "$SG_ALB" \
--protocol tcp --port 80 --cidr 0.0.0.0/0
aws ec2 authorize-security-group-ingress --group-id "$SG_ALB" \
--protocol tcp --port 443 --cidr 0.0.0.0/0
# ECS tasks security group
SG_ECS=$(aws ec2 create-security-group \
--group-name sg-ecs-tasks \
--description "ECS Tasks - allows traffic from ALB only" \
--vpc-id "$VPC_ID" \
--query 'GroupId' --output text)
# Allow traffic from ALB only (reference security group, not CIDR)
aws ec2 authorize-security-group-ingress --group-id "$SG_ECS" \
--protocol tcp --port 3000 --source-group "$SG_ALB"
# Database security group
SG_DB=$(aws ec2 create-security-group \
--group-name sg-database \
--description "RDS - allows connections from ECS tasks only" \
--vpc-id "$VPC_ID" \
--query 'GroupId' --output text)
aws ec2 authorize-security-group-ingress --group-id "$SG_DB" \
--protocol tcp --port 5432 --source-group "$SG_ECS"
Security groups vs Network ACLs
| Feature | Security Groups | Network ACLs |
|---|---|---|
| Level | Resource (ENI) level | Subnet level |
| Stateful | Yes (return traffic auto-allowed) | No (must explicitly allow both directions) |
| Rules | Allow rules only | Allow and deny rules |
| Evaluation | All rules evaluated together | Rules evaluated in number order (first match wins) |
| Default | Deny all inbound, allow all outbound | Allow all inbound and outbound |
| Use case | Primary firewall for all resources | Extra defense layer, blocking specific IPs |
Best practice: Use security groups as your primary firewall. Use NACLs only for edge cases like blocking known malicious IP ranges at the subnet level.
5. Network ACLs (NACLs)
Network ACLs provide an additional layer of security at the subnet level. Unlike security groups, NACLs are stateless — you must configure both inbound and outbound rules.
# Create a NACL for private subnets
NACL_ID=$(aws ec2 create-network-acl \
--vpc-id "$VPC_ID" \
--query 'NetworkAcl.NetworkAclId' --output text)
# Allow inbound HTTP from ALB subnets
aws ec2 create-network-acl-entry --network-acl-id "$NACL_ID" \
--rule-number 100 --protocol tcp --port-range From=3000,To=3000 \
--cidr-block 10.0.1.0/24 --rule-action allow --ingress
# Allow outbound response traffic (ephemeral ports)
aws ec2 create-network-acl-entry --network-acl-id "$NACL_ID" \
--rule-number 100 --protocol tcp --port-range From=1024,To=65535 \
--cidr-block 0.0.0.0/0 --rule-action allow --egress
# Allow outbound HTTPS (for NAT Gateway)
aws ec2 create-network-acl-entry --network-acl-id "$NACL_ID" \
--rule-number 110 --protocol tcp --port-range From=443,To=443 \
--cidr-block 0.0.0.0/0 --rule-action allow --egress
6. VPC Endpoints (Avoiding NAT for AWS Services)
VPC Endpoints let your private subnets talk to AWS services without going through the NAT Gateway — saving cost and improving security.
Without VPC Endpoint:
ECS Task → NAT Gateway → Internet → ECR (public endpoint)
Cost: NAT data transfer charges
With VPC Endpoint:
ECS Task → VPC Endpoint → ECR (private connection)
Cost: VPC Endpoint hourly fee (usually cheaper)
Essential VPC endpoints for ECS Fargate
# ECR API endpoint (for docker pull metadata)
aws ec2 create-vpc-endpoint \
--vpc-id "$VPC_ID" \
--service-name com.amazonaws.us-east-1.ecr.api \
--vpc-endpoint-type Interface \
--subnet-ids "$PRIVATE_SUBNET_1" "$PRIVATE_SUBNET_2" \
--security-group-ids "$SG_ECS"
# ECR Docker endpoint (for image layers)
aws ec2 create-vpc-endpoint \
--vpc-id "$VPC_ID" \
--service-name com.amazonaws.us-east-1.ecr.dkr \
--vpc-endpoint-type Interface \
--subnet-ids "$PRIVATE_SUBNET_1" "$PRIVATE_SUBNET_2" \
--security-group-ids "$SG_ECS"
# S3 Gateway endpoint (ECR stores layers in S3)
aws ec2 create-vpc-endpoint \
--vpc-id "$VPC_ID" \
--service-name com.amazonaws.us-east-1.s3 \
--vpc-endpoint-type Gateway \
--route-table-ids "$PRIVATE_RT"
# CloudWatch Logs endpoint (for ECS logging)
aws ec2 create-vpc-endpoint \
--vpc-id "$VPC_ID" \
--service-name com.amazonaws.us-east-1.logs \
--vpc-endpoint-type Interface \
--subnet-ids "$PRIVATE_SUBNET_1" "$PRIVATE_SUBNET_2" \
--security-group-ids "$SG_ECS"
# Secrets Manager endpoint (for task secrets)
aws ec2 create-vpc-endpoint \
--vpc-id "$VPC_ID" \
--service-name com.amazonaws.us-east-1.secretsmanager \
--vpc-endpoint-type Interface \
--subnet-ids "$PRIVATE_SUBNET_1" "$PRIVATE_SUBNET_2" \
--security-group-ids "$SG_ECS"
7. IAM Roles for ECS Tasks
ECS uses two distinct IAM roles — this is one of the most commonly confused concepts.
┌─────────────────────────────────────────────────────────────────────┐
│ TWO IAM ROLES IN ECS │
│ │
│ ┌──────────────────────────────┐ │
│ │ EXECUTION ROLE │ │
│ │ (executionRoleArn) │ │
│ │ │ │
│ │ WHO: The ECS agent │ │
│ │ WHEN: Before your app starts │ │
│ │ WHAT: │ │
│ │ - Pull images from ECR │ │
│ │ - Send logs to CloudWatch │ │
│ │ - Fetch secrets from SM │ │
│ │ - Fetch params from SSM │ │
│ │ │ │
│ │ Analogy: The janitor who │ │
│ │ opens the building and │ │
│ │ turns on the lights │ │
│ └──────────────────────────────┘ │
│ │
│ ┌──────────────────────────────┐ │
│ │ TASK ROLE │ │
│ │ (taskRoleArn) │ │
│ │ │ │
│ │ WHO: Your application code │ │
│ │ WHEN: While your app runs │ │
│ │ WHAT: │ │
│ │ - Read/write S3 buckets │ │
│ │ - Send messages to SQS │ │
│ │ - Publish to SNS topics │ │
│ │ - Query DynamoDB tables │ │
│ │ - Call any AWS service │ │
│ │ │ │
│ │ Analogy: The employee who │ │
│ │ does the actual work │ │
│ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Creating the Execution Role
# Create the execution role
aws iam create-role \
--role-name ecsTaskExecutionRole \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}'
# Attach the AWS managed policy for ECS task execution
aws iam attach-role-policy \
--role-name ecsTaskExecutionRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
# If using Secrets Manager, add permissions to fetch secrets
aws iam put-role-policy \
--role-name ecsTaskExecutionRole \
--policy-name SecretsAccess \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue"
],
"Resource": [
"arn:aws:secretsmanager:us-east-1:123456789012:secret:user-service/*"
]
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameters"
],
"Resource": [
"arn:aws:ssm:us-east-1:123456789012:parameter/user-service/*"
]
}
]
}'
Creating a Task Role (per service)
Each microservice should have its own task role with only the permissions it needs:
# Task role for the user-service
aws iam create-role \
--role-name userServiceTaskRole \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": { "Service": "ecs-tasks.amazonaws.com" },
"Action": "sts:AssumeRole"
}
]
}'
# Grant only the permissions the user-service needs
aws iam put-role-policy \
--role-name userServiceTaskRole \
--policy-name UserServicePermissions \
--policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::user-profile-photos/*"
},
{
"Effect": "Allow",
"Action": [
"sqs:SendMessage"
],
"Resource": "arn:aws:sqs:us-east-1:123456789012:email-notifications"
},
{
"Effect": "Allow",
"Action": [
"ssmmessages:CreateControlChannel",
"ssmmessages:CreateDataChannel",
"ssmmessages:OpenControlChannel",
"ssmmessages:OpenDataChannel"
],
"Resource": "*",
"Sid": "ECSExec"
}
]
}'
8. The Least Privilege Principle
Least privilege means granting only the minimum permissions needed for a role to function. This is the single most important security practice in IAM.
Bad: Overly permissive
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "*",
"Resource": "*"
}
]
}
This grants full access to everything in your AWS account — all services, all resources. A compromised container could delete databases, launch cryptocurrency miners, or exfiltrate data.
Good: Scoped permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:GetObject"],
"Resource": "arn:aws:s3:::user-profile-photos/*"
}
]
}
This grants read-only access to one specific S3 bucket — nothing else.
Least privilege checklist
| Level | How to Scope |
|---|---|
| Action | Specific actions (s3:GetObject), not wildcards (s3:*) |
| Resource | Specific ARNs (arn:aws:s3:::my-bucket/*), not * |
| Condition | Add conditions where possible (source IP, time, tags) |
| Per-service roles | Each microservice gets its own task role |
| No shared roles | Never share IAM roles across services |
| Regular audits | Review permissions quarterly with IAM Access Analyzer |
9. Common IAM Policies for ECS
ECS Task Execution Role (standard)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:CreateLogGroup"
],
"Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/ecs/*"
}
]
}
S3 read/write for a specific bucket
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::my-bucket",
"arn:aws:s3:::my-bucket/*"
]
}
SQS producer (send messages)
{
"Effect": "Allow",
"Action": [
"sqs:SendMessage",
"sqs:GetQueueAttributes"
],
"Resource": "arn:aws:sqs:us-east-1:123456789012:order-processing-queue"
}
SQS consumer (receive and delete messages)
{
"Effect": "Allow",
"Action": [
"sqs:ReceiveMessage",
"sqs:DeleteMessage",
"sqs:GetQueueAttributes"
],
"Resource": "arn:aws:sqs:us-east-1:123456789012:order-processing-queue"
}
DynamoDB read/write for specific table
{
"Effect": "Allow",
"Action": [
"dynamodb:GetItem",
"dynamodb:PutItem",
"dynamodb:UpdateItem",
"dynamodb:DeleteItem",
"dynamodb:Query"
],
"Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/sessions"
}
10. Network Architecture for a Typical Microservices Deployment
┌─────────────────────────────────────────────────────────────────────────────┐
│ VPC: 10.0.0.0/16 │
│ │
│ ┌──── AZ-a ────────────────────────┐ ┌──── AZ-b ────────────────────────┐│
│ │ │ │ ││
│ │ Public 10.0.1.0/24 │ │ Public 10.0.2.0/24 ││
│ │ ┌────────┐ ┌────────────────┐ │ │ ┌────────────────┐ ││
│ │ │NAT GW │ │ALB (node) │ │ │ │ALB (node) │ ││
│ │ └────────┘ └────────────────┘ │ │ └────────────────┘ ││
│ │ │ │ │ ││
│ │ ───────┼──────────────────── │ │ ────────────────────────── ││
│ │ │ │ │ ││
│ │ Private 10.0.10.0/24 │ │ Private 10.0.20.0/24 ││
│ │ ┌──────────┐ ┌──────────┐ │ │ ┌──────────┐ ┌──────────┐ ││
│ │ │User Svc │ │Order Svc │ │ │ │User Svc │ │Order Svc │ ││
│ │ │Task │ │Task │ │ │ │Task │ │Task │ ││
│ │ └──────────┘ └──────────┘ │ │ └──────────┘ └──────────┘ ││
│ │ │ │ ││
│ │ ─────────────────────────── │ │ ────────────────────────── ││
│ │ │ │ ││
│ │ Data 10.0.100.0/24 │ │ Data 10.0.200.0/24 ││
│ │ ┌──────────┐ ┌──────────┐ │ │ ┌──────────┐ ┌──────────┐ ││
│ │ │RDS │ │ElastiCache│ │ │ │RDS │ │ElastiCache│ ││
│ │ │(primary) │ │(primary) │ │ │ │(standby) │ │(replica) │ ││
│ │ └──────────┘ └──────────┘ │ │ └──────────┘ └──────────┘ ││
│ │ │ │ ││
│ └──────────────────────────────────┘ └──────────────────────────────────┘│
│ │
│ Route Tables: │
│ Public: 0.0.0.0/0 → Internet Gateway │
│ Private: 0.0.0.0/0 → NAT Gateway │
│ Data: No internet route (fully isolated) │
└─────────────────────────────────────────────────────────────────────────────┘
11. Putting It All Together
Here is the order of operations to build a production VPC for ECS:
Step 1: Create VPC (10.0.0.0/16)
Step 2: Create subnets (public, private, data) in 2+ AZs
Step 3: Create Internet Gateway → attach to VPC
Step 4: Create NAT Gateway in a public subnet
Step 5: Create route tables (public → IGW, private → NAT, data → none)
Step 6: Associate subnets with route tables
Step 7: Create security groups (ALB, ECS tasks, databases)
Step 8: Create VPC endpoints (ECR, S3, CloudWatch Logs)
Step 9: Create IAM roles (execution role, task roles per service)
Step 10: Deploy ALB in public subnets, ECS in private subnets, RDS in data subnets
12. Key Takeaways
- VPC is your isolated network — always use a custom VPC for production, never the default.
- Public subnets have internet access (for ALB, NAT). Private subnets do not (for ECS tasks, databases).
- NAT Gateway gives private subnets outbound-only internet access (pulling images, calling APIs).
- Security groups are stateful firewalls — chain them: ALB → ECS → Database. Reference SG IDs, not CIDRs.
- Two IAM roles: Execution role (ECS agent pulls images, writes logs) vs Task role (your app calls AWS services).
- Least privilege — scope every IAM policy to specific actions, specific resources, per service.
- VPC Endpoints let private subnets reach AWS services without NAT — saves cost and improves security.
- Multi-AZ everything — subnets, ALB nodes, ECS tasks, RDS replicas — for high availability.
Explain-It Challenge
- A new ECS task fails to start with "CannotPullContainerError." The image exists in ECR. Walk through all the networking and IAM reasons this could happen.
- Explain to a non-technical stakeholder why your ECS tasks run in private subnets instead of public subnets, using a physical building security analogy.
- Your security team asks why the
order-servicecan read from theuser-profile-photosS3 bucket. Explain how per-service task roles prevent this and why shared roles are dangerous.
Navigation: ← 6.3.c — Application Load Balancer · 6.3 Exercise Questions →