How to Build an AWS Auto Scaling Group With Terraform

Terraform Auto Scaling Group Architecture Overview

I designed and implemented a highly available, fault-tolerant auto-scaling infrastructure using Terraform. The architecture spans three Availability Zones and includes both public and private subnets, demonstrating real-world security and scalability practices.

Why Use AWS Auto Scaling Groups in Production

In today’s dynamic digital landscape, applications need to adapt to changing demands while maintaining security and cost efficiency. This infrastructure addresses several critical business needs:

Handling Traffic Spikes
- Manages unpredictable traffic from marketing campaigns
- Adapts to product launches and seasonal events
- Maintains performance during peak loads
Cost Management
- Dynamically scales resources based on demand
- Prevents over-provisioning during low-traffic periods
- Optimizes cloud spending without sacrificing performance
High Availability and Reliability
- Distributes load across multiple instances
- Ensures seamless failover capabilities
- Minimizes downtime through redundancy
Enhanced User Experience
- Maintains consistent performance during traffic surges
- Ensures responsive application behavior
- Supports customer satisfaction and retention
Future-Proof Scalability
- Provides foundation for business growth
- Adapts to increasing traffic demands
- Requires no significant architectural changes
Security-First Design
- Places web servers in private subnets
- Restricts direct internet access
- Implements defense-in-depth through ALB

AWS Auto Scaling Infrastructure With ALB and VPC

How AWS Auto Scaling Triggers EC2 Instance Creation

Creating the Auto Scaling Architecture With Terraform

CPU Usage High (>70%) → Auto Scaling Group → Uses Launch Template → Creates New EC2 in Private Subnet

ALB to EC2 Traffic Flow in Private Subnets

- Internet traffic → ALB (public subnet) → Target Group → EC2 instances (private subnets)
- Health checks ensure traffic only routes to healthy instances
- Auto scaling maintains service availability based on demand

Terraform Implementation Process

Before deploying our infrastructure, we need to set up our AWS credentials and understand our module structure.

Configuring AWS Credentials for Terraform

First, configure your AWS credentials:

# Set your AWS profile
"export AWS_PROFILE=my-profile"

# Verify your account access
"aws sts get-caller-identity" 
# Note: You'll need the account ID for the monitoring configuration

Terraform Workflow:

Our implementation follows this systematic approach:

Write each module
Validate module configuration
Plan and review changes
Repeat until completion
Apply final configuration

Terraform Init, Validate, and Plan

# Initialize Terraform working directory
terraform init

# Validate module syntax and configuration
terraform validate

# Review planned changes
terraform plan

Terraform Root Module Configuration

Our root module orchestrates all components:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Configure the AWS Provider
provider "aws" {
  region = var.aws_region 
  profile = var.aws_profile #profile name
}

module "vpc" {
  source = "./modules/vpc"

  project_name = var.project_name
}

module "launch_template" {
  source = "./modules/launch-template"
  project_name = var.project_name
  vpc_id = module.vpc.vpc_id
}

module "alb" {
  source = "./modules/alb"
  project_name = var.project_name
  vpc_id = module.vpc.vpc_id
  public_subnet_ids = module.vpc.public_subnet_ids
  alb_security_groups_id = module.launch_template.alb_security_groups_id
}

module "asg" {
  source = "./modules/asg"
  project_name = var.project_name
  private_subnet_ids = module.vpc.private_subnet_ids
  launch_template_id = module.launch_template.launch_template_id
  launch_template_version = module.launch_template.launch_template_version
  lb_target_group_arn = module.alb.lb_target_group_arn
}

module "monitoring" {
  source = "./modules/monitoring"
  project_name = var.project_name
  autoscaling_group_name = module.asg.autoscaling_group_name
  autoscaling_policy_dwn_arn = module.asg.autoscaling_policy_dwn_arn
  account-id = var.account-id
  autoscaling_policy_up_arn = module.asg.autoscaling_policy_up_arn
  notification_email = var.notification_email
}

Key Features:

Modular Design

Each component is a separate module
Clear dependency chain
Easy to maintain and update

Resource Flow

VPC provides network foundation
Launch template defines instance configuration
ALB handles traffic distribution
ASG manages scaling
Monitoring provides oversight

Output Configuration

output "alb_dns_name" {
  value = module.alb.alb_dns_name
}

This output provides the ALB DNS name for accessing our application.

Terraform VPC Module With Public and Private Subnets

The first major component is creating a VPC module with public and private subnets.

VPC and Availability Zone Variables

variable "vpc_cidr" {
  type = string
  description = "CIDR block for VPC"
  default = "10.0.0.0/16"
}

variable "azs" {
  type = list(string)
  description = "availabilty zones"
  default = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

Creating the AWS VPC With Terraform

# Create vpc 
resource "aws_vpc" "main" {
    cidr_block = var.vpc_cidr
    enable_dns_support = true
    enable_dns_hostnames = true
}

The VPC is configured with:

CIDR block 10.0.0.0/16
DNS hostname support enabled
DNS resolution enabled

Public Subnets, Internet Gateway, and Route Tables

# Create public subnets
resource "aws_subnet" "public" {
  count = length(var.azs)      
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.${count.index + 1}.0/24"
  availability_zone = var.azs[count.index]

  map_public_ip_on_launch = true
}

# internet gateway for public subnets
resource "aws_internet_gateway" "main" {
    vpc_id = aws_vpc.main.id

# route table for public subnets
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }

# 
resource "aws_route_table_association" "public" {
  count = length(var.azs)  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

This creates:

Three public subnets across different AZs (10.0.1.0/24, 10.0.2.0/24, 10.0.3.0/24)
Enabled auto-assign public IP setting for these subnets
Attached an Internet Gateway to enable direct internet connectivity
Configured a public route table with a route to 0.0.0.0/0 via the IGW

Private Subnets With NAT Gateway for EC2 Instances

# create private subnets
resource "aws_subnet" "private" {
  count = length(var.azs)      
  vpc_id     = aws_vpc.main.id
  cidr_block = "10.0.${count.index + 10}.0/24"
  availability_zone = var.azs[count.index]

  tags = {
    Name = "${var.project_name}-private-${count.index + 1}"
  }
}

resource "aws_eip" "nat" {
  domain = "vpc"

# nate gateway for private subnets internet access
resource "aws_nat_gateway" "main" {
  allocation_id = aws_eip.nat.id
  subnet_id = aws_subnet.public[0].id

    depends_on = [ aws_internet_gateway.main ]
}

# route table for private subnets
resource "aws_route_table" "private" {
  vpc_id = aws_vpc.main.id

  route {
    cidr_block = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.main.id
  }

resource "aws_route_table_association" "private" {
  count = length(var.azs)  
  subnet_id      = aws_subnet.private[count.index].id
  route_table_id = aws_route_table.private.id
}

This implements:

Three private subnets (10.0.10.0/24, 10.0.11.0/24, 10.0.12.0/24)
NAT Gateway in the first public subnet
Private route table routing through NAT Gateway
No public IP assignment

The architecture ensures reliability through:

Multi-AZ deployment for redundancy
Strategically placed NAT Gateway
Paired public/private subnets in each AZ
Optimized AWS data transfer paths

EC2 Launch Template and Security Group Configuration

This module defines our instance configurations and security boundaries. I use a security-first approach with separate security groups for the ALB and EC2 instances.

AWS Security Groups for ALB and EC2

EC2 Security Group (Private Subnet)

# restrict outgoing traffic to alb onlyy
resource "aws_security_group" "ec2" {
  name = "${var.project_name}-ec2-sg"
  vpc_id = var.vpc_id

  lifecycle {
    create_before_destroy = true
  }
}

# Allow only Alb traffic
resource "aws_security_group_rule" "allow_alb" {
  type = "ingress"
  security_group_id = aws_security_group.ec2.id
  source_security_group_id = aws_security_group.alb.id
  from_port         = 80
  protocol       = "tcp"
  to_port           = 80
  
}

resource "aws_security_group_rule" "allow_ec2_egress" {
  type = "egress"
  security_group_id = aws_security_group.ec2.id
  from_port         = 0
  protocol       = "-1"
  to_port           = 0
  cidr_blocks = [ "0.0.0.0/0" ]
}

Security measures:

Restricts inbound access to ALB traffic only
Allows outbound internet access via NAT Gateway
Ensures instances remain private and secure

ALB Security Group (Public Subnet)

# Alb security group 
resource "aws_security_group" "alb" {
  name = "${var.project_name}-alb-sg"
  vpc_id = var.vpc_id

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_security_group_rule" "allow_alb_ingress" {
  type = "ingress"
  security_group_id = aws_security_group.alb.id
  cidr_blocks         = ["0.0.0.0/0"]
  from_port         = 80
  protocol       = "tcp"
  to_port           = 80
}

resource "aws_security_group_rule" "allow_alb_egress" {
  type = "egress"
  security_group_id = aws_security_group.alb.id
  cidr_blocks          = ["0.0.0.0/0"]
  to_port = 0
  from_port = 0
  protocol       = "-1" 
}

Features:

Accepts HTTP traffic from internet
Routes traffic to EC2 instances
Acts as public entry point

AWS Launch Template With Ubuntu AMI and User Data

The launch template defines the EC2 instance configuration:

resource "aws_launch_template" "main" {
  name_prefix   = "${var.project_name}-"

  image_id = data.aws_ami.ubuntu.id

  instance_type = "t2.micro"


  network_interfaces {
    associate_public_ip_address = false
    security_groups = [aws_security_group.ec2.id]
  }


  tag_specifications {
    resource_type = "instance"

  lifecycle {
    create_before_destroy = true
  }

  monitoring {
    enabled = "true"
  }

  user_data = filebase64("${abspath(path.root)}/scripts/ec2.sh")

  iam_instance_profile {
    name = aws_iam_instance_profile.ec2_profile.name
  }
}

data "aws_ami" "ubuntu" {
  most_recent = true
  owners = [ "099720109477" ]

  filter {
    name = "name"
    values = [ "ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*" ]
  }

  filter {
    name = "virtualization-type"
    values = [ "hvm" ]
  }
  
}

Key Configurations:

t2.micro for cost efficiency
Uses Ubuntu AMI for consistency
Disabled public IPs for security
Enables detailed monitoring
Includes user data script for setup

IAM Role and Instance Profile for EC2

Implements least-privilege access:

resource "aws_iam_role" "ec2_role" {
  name = "${var.project_name}-ec2-role"

  # Terraform's "jsonencode" function converts a
  # Terraform expression result to valid JSON syntax.
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      },
    ]
  })
}

resource "aws_iam_instance_profile" "ec2_profile" {
  name = "${var.project_name}-ec2-profile"
  role = aws_iam_role.ec2_role.name
}




resource "aws_iam_role_policy" "ec2_custom_policy" {
  name = "${var.project_name}-ec2-custom-policy"
  role = aws_iam_role.ec2_role.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "elasticloadbalancing:Describe*",
          "elasticloadbalancing:DeregisterTargets",
          "elasticloadbalancing:RegisterTargets"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "ec2:DescribeInstances",
          "ec2:DescribeTags"
        ]
        Resource = "*"
      }
    ]
  })
}

Permissions granted:

Load balancer registration
Instance metadata access
Basic EC2 operations

Application Load Balancer and Auto Scaling Group Setup

This section configures our application’s load balancing and auto scaling capabilities.

Creating an Application Load Balancer With Terraform

First, I created the Application Load Balancer in our public subnets:

resource "aws_lb" "main" {
  name               = "${var.project_name}-alb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [var.alb_security_groups_id]
  subnets            = var.public_subnet_ids

  enable_deletion_protection = false
}

Key Features:

Deployed in public subnets for internet accessibility
Internet-facing for public access
Uses security group allowing HTTP traffic

ALB Target Group With Health Check Configuration

# create target group
resource "aws_lb_target_group" "main" {
  name        = "${var.project_name}-alb-tg"
  target_type = "instance"
  port        = 80
  protocol    = "HTTP"
  vpc_id      = var.vpc_id

  health_check {
    enabled             = true
    healthy_threshold   = 2
    interval            = 30
    matcher            = "200"
    path               = "/"
    port               = "traffic-port"
    timeout            = 5
    unhealthy_threshold = 2
  }
}

Created target group with health checks:

Path: ”/” for root endpoint testing
30-second check intervals
Healthy threshold: 2 successful checks
Unhealthy threshold: 2 failed checks

# create alb listener
resource "aws_lb_listener" "http" {
  load_balancer_arn = aws_lb.main.arn
  port = 80
  protocol = "HTTP"

  default_action {
    type = "forward"
    target_group_arn = aws_lb_target_group.main.arn
  }
}

Configured HTTP listener on port 80 to forward traffic to target group

Auto Scaling Group With Scaling Policies

The ASG manages our EC2 instances:

resource "aws_autoscaling_group" "main" {
  name                      = "${var.project_name}-asg"
  max_size                  = 9
  min_size                  = 3
  desired_capacity          = 3    
  health_check_type         = "ELB"
  health_check_grace_period = 300   
  vpc_zone_identifier       = var.private_subnet_ids
  target_group_arns  = [var.lb_target_group_arn] 


  launch_template {
    id      = var.launch_template_id
    version = var.launch_template_version
  }
  # var.lb_target_group_arn

  tag {
    key                 = "Name"
    value               = "${var.project_name}-asg-instance"
    propagate_at_launch = true
  }

}

Configuration Details:

Placed EC2 instances in private subnets for enhanced security

Capacity settings:

Minimum: 3 instances
Maximum: 9 instances
Initial : 3 instances

Distribution:

Spread across multiple availability zones
Integrated with ALB target group for load distribution

Scaling Policies:

resource "aws_autoscaling_policy" "scaleUp" {
  name                   = "${var.project_name}-asg-up"
  scaling_adjustment     = 1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.main.name

}

resource "aws_autoscaling_policy" "scaleDown" {
  name                   = "${var.project_name}-asg-down"
  scaling_adjustment     = -1
  adjustment_type        = "ChangeInCapacity"
  cooldown               = 300
  autoscaling_group_name = aws_autoscaling_group.main.name

}

Scales up by 1 instance when triggered
Scales down by 1 instance when triggered
5-minute cooldown between scaling actions
Matches with CloudWatch alarms for CPU metrics

The monitoring strategy combines CloudWatch alarms with SNS notifications to provide comprehensive oversight of the auto-scaling infrastructure.

CloudWatch CPU Utilization Alarms for Scale-Up and Scale-Down

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "${var.project_name}-highCpu-alarm"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 120
  statistic           = "Average"
  threshold           = 70

  dimensions = {
    AutoScalingGroupName = var.autoscaling_group_name
  }

  alarm_description = "Scale out if CPU > 70%"
  alarm_actions     = [var.autoscaling_policy_up_arn]
}

resource "aws_cloudwatch_metric_alarm" "low_cpu" {
  alarm_name          = "${var.project_name}-lowCpu-alarm"
  comparison_operator = "LessThanThreshold"
  evaluation_periods  = 1
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 120
  statistic           = "Average"
  threshold           = 20

  dimensions = {
    AutoScalingGroupName = var.autoscaling_group_name
  }

  alarm_description = "Scale out if CPU > 70%"
  alarm_actions     = [var.autoscaling_policy_dwn_arn]
}

Configuration Details:

Scale-Out Alarm (High CPU)

Triggers when CPU > 70%
Evaluates every 120 seconds
Initiates immediate scaling

Scale-In Alarm (Low CPU)

Triggers when CPU < 20%
Same evaluation period
Reduces capacity when load decreases

SNS Topic Setup

resource "aws_sns_topic" "main" {
  name = "${var.project_name}-sns-topic"
}

resource "aws_sns_topic_policy" "policy" {
  arn = aws_sns_topic.main.arn

  policy = data.aws_iam_policy_document.sns_topic_policy.json
}

data "aws_iam_policy_document" "sns_topic_policy" {
  policy_id = "__default_policy_ID"

  statement {
    actions = [
      "SNS:Subscribe",
      "SNS:SetTopicAttributes",
      "SNS:RemovePermission",
      "SNS:Receive",
      "SNS:Publish",
      "SNS:ListSubscriptionsByTopic",
      "SNS:GetTopicAttributes",
      "SNS:DeleteTopic",
      "SNS:AddPermission",
    ]

    condition {
      test     = "StringEquals"
      variable = "AWS:SourceOwner"

      values = [
        var.account-id,
      ]
    }

    effect = "Allow"

    principals {
      type        = "AWS"
      identifiers = ["*"]
    }

    resources = [aws_sns_topic.main.arn]

    sid = "__default_statement_ID"
  }
}

resource "aws_autoscaling_notification" "asg_notifications" {
  group_names = [var.autoscaling_group_name]
  topic_arn   = aws_sns_topic.main.arn

  notifications = [
    "autoscaling:EC2_INSTANCE_LAUNCH",
    "autoscaling:EC2_INSTANCE_TERMINATE",
    "autoscaling:EC2_INSTANCE_LAUNCH_ERROR",
    "autoscaling:EC2_INSTANCE_TERMINATE_ERROR"
  ]
}

Email Alert Configuration

resource "aws_sns_topic_subscription" "email" {
  topic_arn = aws_sns_topic.main.arn
  protocol  = "email"
  endpoint  = var.notification_email  
}

Monitoring Coverage:

Auto Scaling Events

Instance launches
Instance terminations
Scaling errors
Operation failures

Real-Time Notifications

Scale-out actions
Scale-in actions
Infrastructure health alerts
Error conditions

Troubleshooting AWS Auto Scaling Deployment Issues

How to Fix 502 Bad Gateway Error on AWS ALB

When attempting to access the Application Load Balancer’s DNS name, I encountered a 502 Bad Gateway error, indicating a communication breakdown between the load balancer and the backend instances. This led me through a comprehensive troubleshooting process that demonstrates the complexity of debugging distributed systems.

Browser showing a 502 Bad Gateway error from the Application Load Balancer

Debugging Unhealthy Targets in ALB Target Group

After confirming that the EC2 instances were running in their designated private subnets, I discovered that while the instances were operational, the ALB target group showed all targets as unhealthy. This suggested a deeper application-level issue rather than an infrastructure problem. A thorough review of security group configurations and the launch template confirmed that the basic networking and instance setup were correct.

ALB target group showing all targets in unhealthy status

Using AWS SSM Session Manager to Access Private EC2 Instances

The architecture’s security-first design presented an interesting troubleshooting challenge. With EC2 instances deliberately placed in private subnets and no public IP addresses or SSH access configured, traditional debugging approaches weren’t viable. The solution required:

Implementing AWS Systems Manager Session Manager access by:
- Adding appropriate IAM permissions to the EC2 role via terraform
- Creating VPC endpoints for Systems Manager connectivity

resource "aws_iam_role_policy_attachment" "ssm_policy" {
  role       = aws_iam_role.ec2_role.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

# updated custom policy
resource "aws_iam_role_policy" "ec2_custom_policy" {
  name = "${var.project_name}-ec2-custom-policy"
  role = aws_iam_role.ec2_role.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "elasticloadbalancing:Describe*",
          "elasticloadbalancing:DeregisterTargets",
          "elasticloadbalancing:RegisterTargets",
          "ssm:UpdateInstanceInformation",
          "ssm:ListInstanceAssociations",
          "ssm:DescribeDocument",
          "ssm:GetDocument",
          "ssm:GetParameter"
        ]
        Resource = "*"
      },
      {
        Effect = "Allow"
        Action = [
          "ec2:DescribeInstances",
          "ec2:DescribeTags"
        ]
        Resource = "*"
      }
    ]
  })
}

Establishing secure instance access through the AWS console

Go to VPC → Endpoints Click “Create Endpoint”

AWS console VPC Endpoints page with the Create Endpoint button

Under services I added these 4 endpoints one by one:

com.amazonaws.us-east-1.ssm
com.amazonaws.us-east-1.ssmmessages
com.amazonaws.us-east-1.ec2
com.amazonaws.us-east-1.ec2messages

VPC endpoint service selection showing SSM, SSM Messages, EC2, and EC2 Messages endpoints

I selected the VPC of the autoscaling group

Make sure “Enable DNS name” is left unchecked since we are working with private subnets.

Under Subnets I selected the 3 private subnets available for all 3 AZs. Make sure “Designate Ip Addresses” is left unchecked we are working with private subnets

VPC endpoint subnet selection with three private subnets across availability zones

I gave the the policy Full Access since we are debugging

Root Cause: Nginx Not Installed in Launch Template User Data

After gaining instance access through Session Manager, I followed a systematic troubleshooting approach to verify network connectivity and instance configuration:

First, I checked if the NAT Gateway was functioning correctly by testing internet connectivity:

ping google.com

This test confirmed the instance could reach the internet through the NAT Gateway in the public subnet, validating the network architecture was working as designed.

Session Manager terminal showing a successful ping to google.com confirming NAT Gateway connectivity

Next, I verified I was on the correct instance by retrieving its metadata:

curl http://169.254.169.254/latest/meta-data/instance-id

This step is crucial when working with auto-scaled instances to ensure I was troubleshooting the right target.

Session Manager terminal showing the instance ID retrieved from EC2 metadata

From there I ran the command sudo systemctl status nginx that showed me that nginx wasn’t installed on the instances. This explained the failed health checks - the web servers weren’t present to respond to the ALB’s requests. The resolution involved:

Manually installing nginx
Creating the index.html file using sudo tee
Enabling and restarting the nginx service

Resolution: Fixing Health Checks and Validating User Data Scripts

Healthy instance registered in availability zone us-east-1a

This troubleshooting experience reinforced several crucial aspects of AWS infrastructure management:

The importance of validating user data scripts before deployment
The value of AWS Systems Manager for secure instance management
The critical relationship between ALB health checks and application state
The effectiveness of defense-in-depth security practices
The benefit of testing configurations with spot instances before production deployment