Terraform infrastructure as code

Have agents write Terraform with remote state, workspaces or directory-based environments, reusable modules, and validated variables—stress plan review and call out destructive changes.

The SKILL pins provider versions, required_version, and how .tfvars plus CI inject secrets—never commit plaintext keys to the repo.

Module contracts: document inputs/outputs; use moved/import during refactors; spell out lifecycle and prevent_destroy guardrails for critical resources.

If policy-as-code (OPA/Sentinel) or cost estimation is wired in, require agents to attach plan highlights in the PR description.

Formatting: order terraform fmt, validate, and tflint in CI.
State: locks, drift detection, and approval for manual state rm.
Multi-cloud / multi-region: provider aliases and directory conventions for module calls.

IaC main flow

  [ Pin: terraform / provider / module source versions ]
                    │
                    ▼
         [ Vars: type constraints, sensitive flags, .tfvars / CI secrets ]
                    │
                    ▼
    [ fmt → validate → (optional tflint) → plan -out=tfplan ]
                    │
           ┌────────┴────────┐
           ▼                 ▼
  [ Human/policy: review plan, flag destroy/replace ]     [ Merge gate: no apply without approval ]
           │                 │
           └────────┬────────┘
                    ▼
         [ Apply: named plan file / controlled env / rollback notes ]

Agent output must separate “plan only” vs “approved to apply”; for -destroy, replace, or sensitive resource changes, cite addresses and risks explicitly.

State: remote backend, locks, drift

Remote backend: document bucket/table/workspace prefix rules in the SKILL; never treat a shared local terraform.tfstate as source of truth.
Locks: document lock timeouts, retries for CI vs laptop, and who may force-unlock; agents should not suggest forced unlock without human approval.
Drift: baseline “clean plan” on a schedule or before releases; if drift appears outside Terraform, record import vs refresh vs manual alignment.
Risky ops: state rm, state mv, provider swaps, or backend migrations need checklists plus rollback.

Remote backend configuration (S3 + DynamoDB lock) and variable type constraint examples:

# backend.tf — S3 remote state + DynamoDB distributed lock
terraform {
  required_version = ">= 1.6.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"   # lock major version; minor upgrades allowed
    }
  }

  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "services/myapp/production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "mycompany-terraform-locks"  # prevent concurrent writes

    # Role assumption (OIDC + role) replaces long-lived access keys
    role_arn = "arn:aws:iam::123456789012:role/TerraformStateRole"
  }
}

# variables.tf — type constraints and validation blocks
variable "environment" {
  type        = string
  description = "Deployment target environment (dev/staging/production)"

  validation {
    condition     = contains(["dev", "staging", "production"], var.environment)
    error_message = "environment must be one of: dev, staging, or production."
  }
}

variable "instance_count" {
  type        = number
  description = "EC2 instance count (1-20)"
  default     = 1

  validation {
    condition     = var.instance_count >= 1 && var.instance_count <= 20
    error_message = "instance_count must be between 1 and 20."
  }
}

variable "allowed_cidr_blocks" {
  type        = list(string)
  description = "List of allowed CIDR blocks"
  default     = []

  validation {
    condition = alltrue([
      for cidr in var.allowed_cidr_blocks :
      can(cidrhost(cidr, 0))
    ])
    error_message = "Each value in allowed_cidr_blocks must be a valid CIDR format."
  }
}

Modules: interface, versions, refactors

Interface: document required inputs, defaults, and breaking changes in variables.tf/outputs.tf; cross-team modules should ship semver or pinned Git refs.
Composition: pick directory or workspace strategy for dev/stage/prod and stick to it—avoid implicit mixing in one root.
Refactors: use moved/import to avoid pointless destroy/create; large renames ship with state migration or phased PRs.
Critical resources: when using lifecycle { prevent_destroy = true } (DB, certs, prod entry), document exception approvals in the SKILL.

Complete Terraform module example (inputs / outputs / main.tf):

# modules/ec2-service/variables.tf
variable "service_name" {
  type        = string
  description = "Service name used as resource naming prefix"
}

variable "environment" {
  type        = string
  description = "Deployment environment (dev/staging/production)"
}

variable "instance_type" {
  type        = string
  description = "EC2 instance type"
  default     = "t3.micro"
}

variable "subnet_ids" {
  type        = list(string)
  description = "List of subnet IDs to deploy into (at least 2 availability zones)"
}

# modules/ec2-service/main.tf
resource "aws_security_group" "this" {
  name        = "${var.service_name}-${var.environment}"
  description = "Security group for ${var.service_name}"

  lifecycle {
    create_before_destroy = true   # create new SG first, then delete old one for zero downtime
  }

  tags = {
    Name        = "${var.service_name}-${var.environment}"
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

resource "aws_instance" "this" {
  ami           = data.aws_ami.al2023.id
  instance_type = var.instance_type
  subnet_id     = var.subnet_ids[0]

  vpc_security_group_ids = [aws_security_group.this.id]

  lifecycle {
    prevent_destroy = true   # production critical resource guard; exception approval in runbook
    ignore_changes  = [ami]  # AMI updates managed by rebuild process, not reflected in plan
  }
}

# modules/ec2-service/outputs.tf
output "instance_id" {
  description = "EC2 instance ID"
  value       = aws_instance.this.id
}

output "security_group_id" {
  description = "Security group ID for reference by upstream modules"
  value       = aws_security_group.this.id
  sensitive   = false
}

# Root module usage example (environments/production/main.tf)
# module "web_service" {
#   source = "../../modules/ec2-service"
#   # Version pin: use fixed Git ref or published tag
#   # source = "git::https://github.com/myorg/tf-modules.git//ec2-service?ref=v1.2.0"
#
#   service_name  = "web"
#   environment   = "production"
#   instance_type = "t3.medium"
#   subnet_ids    = module.vpc.private_subnet_ids
# }

Pipeline: fmt / validate / plan

Suggested order: terraform fmt -check → terraform validate → (optional) tflint/tfsec → non-interactive plan with read-only creds or mocks.
PR artifacts: store plan text or structured output so reviewers/bots can compare counts of changes/destroys.
Sensitive plans: keep in CI secret storage or redact; never paste full state or raw secrets in comments.

Terraform plan CI integration with Infracost cost estimation:

# .github/workflows/terraform.yml
jobs:
  plan:
    runs-on: ubuntu-24.04
    permissions:
      id-token: write
      contents: read
      pull-requests: write   # for commenting plan summary
    steps:
      - uses: actions/checkout@v4

      - uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "1.7.0"

      - name: Configure AWS credentials (OIDC)
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ vars.TF_PLAN_ROLE_ARN }}   # read-only role
          aws-region: us-east-1

      - name: Terraform fmt check
        run: terraform fmt -check -recursive

      - name: Terraform validate
        run: terraform validate

      - name: tflint
        uses: terraform-linters/setup-tflint@v4
        with:
          tflint_version: v0.50.0
      - run: tflint --recursive

      - name: Terraform plan
        id: plan
        run: |
          terraform plan \
            -out=tfplan \
            -no-color \
            -input=false \
            2>&1 | tee plan.txt
          # Extract summary: change count and destroy count
          echo "summary=$(grep -E 'Plan:|No changes' plan.txt | tail -1)" >> "$GITHUB_OUTPUT"

      - name: Infracost cost estimate
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
      - run: |
          infracost breakdown --path tfplan --format json > infracost.json
          infracost comment github \
            --path infracost.json \
            --github-token ${{ secrets.GITHUB_TOKEN }} \
            --pull-request ${{ github.event.pull_request.number }} \
            --behavior update

      - name: Comment plan on PR
        uses: marocchino/sticky-pull-request-comment@v2
        with:
          message: |
            ### Terraform Plan Summary
            ${{ steps.plan.outputs.summary }}
            <details><summary>Full plan</summary>

            \`\`\`
            ${{ steps.plan.outputs.stdout }}
            \`\`\`
            </details>

Workspace & resource address check

Enter a workspace name (letters, digits, -, _) or a resource address (e.g. module.vpc.aws_subnet.private[0]). Parsing runs locally in the browser—nothing is uploaded.

Workspace or resource address

Workspace: non-empty, length ≤ 256, only [A-Za-z0-9_-]. Addresses without dots use workspace rules; with dots they follow type.local, module…type.local, or exactly three segments data.type.local; each segment is an identifier with optional [non-negative index].

---
name: terraform-iac
description: Modular Terraform with remote state and safe plan/apply habits
tags: [terraform, iac, devops, aws]
---
# State management
1. Remote backend: S3 bucket + DynamoDB table for state storage and distributed locking
2. Backend role_arn obtained via OIDC; long-lived AWS Access Keys must not enter CI
3. Each environment (dev/staging/prod) uses an independent state key; shared state is forbidden
4. Drift detection: clean plan on schedule or before releases; non-TF changes must be imported or aligned

# Module design
5. variables.tf: description documents required fields, type constraints, and validation blocks
6. outputs.tf: sensitive = true marks sensitive outputs; not shown in plaintext in plan
7. lifecycle.prevent_destroy = true guards production critical resources (DB, certs)
8. moved blocks rename resources without destroy/create; import blocks onboard existing resources

# CI pipeline
9. Order: terraform fmt -check → validate → tflint → plan -out=tfplan
10. Plan summary (change count/destroy count) commented on PR; human approval required before apply
11. infracost breakdown estimates cost changes and auto-comments on PR
12. Apply must specify plan file (terraform apply tfplan); planless apply is forbidden

# Security & governance
13. Plaintext secrets must never appear in .tf files or plan output
14. sensitive = true variables stay out of plan text; sensitive state fields shown via CI secrets
15. state rm/mv requires a checklist and rollback path, with human approval
16. tfsec / checkov scans for security misconfigurations (e.g. public S3 buckets, unencrypted storage)

All skills More skills