Terraform infrastructure as code
Have agents write Terraform with remote state, workspaces or directory-based environments, reusable modules, and validated variables—stress plan review and call out destructive changes.
The SKILL pins provider versions, required_version, and how .tfvars plus CI inject secrets—never commit plaintext keys to the repo.
Module contracts: document inputs/outputs; use moved/import during refactors; spell out lifecycle and prevent_destroy guardrails for critical resources.
If policy-as-code (OPA/Sentinel) or cost estimation is wired in, require agents to attach plan highlights in the PR description.
- Formatting: order
terraform fmt, validate, and tflint in CI. - State: locks, drift detection, and approval for manual
state rm. - Multi-cloud / multi-region: provider aliases and directory conventions for module calls.
IaC main flow
[ Pin: terraform / provider / module source versions ]
│
▼
[ Vars: type constraints, sensitive flags, .tfvars / CI secrets ]
│
▼
[ fmt → validate → (optional tflint) → plan -out=tfplan ]
│
┌────────┴────────┐
▼ ▼
[ Human/policy: review plan, flag destroy/replace ] [ Merge gate: no apply without approval ]
│ │
└────────┬────────┘
▼
[ Apply: named plan file / controlled env / rollback notes ]
-destroy, replace, or sensitive resource changes, cite addresses and risks explicitly.
State: remote backend, locks, drift
-
Remote backend: document bucket/table/workspace prefix rules in the SKILL; never treat a shared local
terraform.tfstateas source of truth. -
Locks: document lock timeouts, retries for CI vs laptop, and who may
force-unlock; agents should not suggest forced unlock without human approval. - Drift: baseline “clean plan” on a schedule or before releases; if drift appears outside Terraform, record import vs refresh vs manual alignment.
-
Risky ops:
state rm,state mv, provider swaps, or backend migrations need checklists plus rollback.
Remote backend configuration (S3 + DynamoDB lock) and variable type constraint examples:
# backend.tf — S3 remote state + DynamoDB distributed lock
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # lock major version; minor upgrades allowed
}
}
backend "s3" {
bucket = "mycompany-terraform-state"
key = "services/myapp/production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "mycompany-terraform-locks" # prevent concurrent writes
# Role assumption (OIDC + role) replaces long-lived access keys
role_arn = "arn:aws:iam::123456789012:role/TerraformStateRole"
}
}
# variables.tf — type constraints and validation blocks
variable "environment" {
type = string
description = "Deployment target environment (dev/staging/production)"
validation {
condition = contains(["dev", "staging", "production"], var.environment)
error_message = "environment must be one of: dev, staging, or production."
}
}
variable "instance_count" {
type = number
description = "EC2 instance count (1-20)"
default = 1
validation {
condition = var.instance_count >= 1 && var.instance_count <= 20
error_message = "instance_count must be between 1 and 20."
}
}
variable "allowed_cidr_blocks" {
type = list(string)
description = "List of allowed CIDR blocks"
default = []
validation {
condition = alltrue([
for cidr in var.allowed_cidr_blocks :
can(cidrhost(cidr, 0))
])
error_message = "Each value in allowed_cidr_blocks must be a valid CIDR format."
}
}
Modules: interface, versions, refactors
-
Interface: document required inputs, defaults, and breaking changes in
variables.tf/outputs.tf; cross-team modules should ship semver or pinned Git refs. - Composition: pick directory or workspace strategy for dev/stage/prod and stick to it—avoid implicit mixing in one root.
-
Refactors: use
moved/importto avoid pointless destroy/create; large renames ship with state migration or phased PRs. -
Critical resources: when using
lifecycle { prevent_destroy = true }(DB, certs, prod entry), document exception approvals in the SKILL.
Complete Terraform module example (inputs / outputs / main.tf):
# modules/ec2-service/variables.tf
variable "service_name" {
type = string
description = "Service name used as resource naming prefix"
}
variable "environment" {
type = string
description = "Deployment environment (dev/staging/production)"
}
variable "instance_type" {
type = string
description = "EC2 instance type"
default = "t3.micro"
}
variable "subnet_ids" {
type = list(string)
description = "List of subnet IDs to deploy into (at least 2 availability zones)"
}
# modules/ec2-service/main.tf
resource "aws_security_group" "this" {
name = "${var.service_name}-${var.environment}"
description = "Security group for ${var.service_name}"
lifecycle {
create_before_destroy = true # create new SG first, then delete old one for zero downtime
}
tags = {
Name = "${var.service_name}-${var.environment}"
Environment = var.environment
ManagedBy = "terraform"
}
}
resource "aws_instance" "this" {
ami = data.aws_ami.al2023.id
instance_type = var.instance_type
subnet_id = var.subnet_ids[0]
vpc_security_group_ids = [aws_security_group.this.id]
lifecycle {
prevent_destroy = true # production critical resource guard; exception approval in runbook
ignore_changes = [ami] # AMI updates managed by rebuild process, not reflected in plan
}
}
# modules/ec2-service/outputs.tf
output "instance_id" {
description = "EC2 instance ID"
value = aws_instance.this.id
}
output "security_group_id" {
description = "Security group ID for reference by upstream modules"
value = aws_security_group.this.id
sensitive = false
}
# Root module usage example (environments/production/main.tf)
# module "web_service" {
# source = "../../modules/ec2-service"
# # Version pin: use fixed Git ref or published tag
# # source = "git::https://github.com/myorg/tf-modules.git//ec2-service?ref=v1.2.0"
#
# service_name = "web"
# environment = "production"
# instance_type = "t3.medium"
# subnet_ids = module.vpc.private_subnet_ids
# }
Pipeline: fmt / validate / plan
- Suggested order:
terraform fmt -check→terraform validate→ (optional) tflint/tfsec → non-interactiveplanwith read-only creds or mocks. - PR artifacts: store plan text or structured output so reviewers/bots can compare counts of changes/destroys.
- Sensitive plans: keep in CI secret storage or redact; never paste full state or raw secrets in comments.
Terraform plan CI integration with Infracost cost estimation:
# .github/workflows/terraform.yml
jobs:
plan:
runs-on: ubuntu-24.04
permissions:
id-token: write
contents: read
pull-requests: write # for commenting plan summary
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "1.7.0"
- name: Configure AWS credentials (OIDC)
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ vars.TF_PLAN_ROLE_ARN }} # read-only role
aws-region: us-east-1
- name: Terraform fmt check
run: terraform fmt -check -recursive
- name: Terraform validate
run: terraform validate
- name: tflint
uses: terraform-linters/setup-tflint@v4
with:
tflint_version: v0.50.0
- run: tflint --recursive
- name: Terraform plan
id: plan
run: |
terraform plan \
-out=tfplan \
-no-color \
-input=false \
2>&1 | tee plan.txt
# Extract summary: change count and destroy count
echo "summary=$(grep -E 'Plan:|No changes' plan.txt | tail -1)" >> "$GITHUB_OUTPUT"
- name: Infracost cost estimate
uses: infracost/actions/setup@v3
with:
api-key: ${{ secrets.INFRACOST_API_KEY }}
- run: |
infracost breakdown --path tfplan --format json > infracost.json
infracost comment github \
--path infracost.json \
--github-token ${{ secrets.GITHUB_TOKEN }} \
--pull-request ${{ github.event.pull_request.number }} \
--behavior update
- name: Comment plan on PR
uses: marocchino/sticky-pull-request-comment@v2
with:
message: |
### Terraform Plan Summary
${{ steps.plan.outputs.summary }}
<details><summary>Full plan</summary>
\`\`\`
${{ steps.plan.outputs.stdout }}
\`\`\`
</details>
Workspace & resource address check
Enter a workspace name (letters, digits, -, _) or a resource address (e.g. module.vpc.aws_subnet.private[0]). Parsing runs locally in the browser—nothing is uploaded.
Workspace: non-empty, length ≤ 256, only [A-Za-z0-9_-]. Addresses without dots use workspace rules; with dots they follow type.local, module…type.local, or exactly three segments data.type.local; each segment is an identifier with optional [non-negative index].
---
name: terraform-iac
description: Modular Terraform with remote state and safe plan/apply habits
tags: [terraform, iac, devops, aws]
---
# State management
1. Remote backend: S3 bucket + DynamoDB table for state storage and distributed locking
2. Backend role_arn obtained via OIDC; long-lived AWS Access Keys must not enter CI
3. Each environment (dev/staging/prod) uses an independent state key; shared state is forbidden
4. Drift detection: clean plan on schedule or before releases; non-TF changes must be imported or aligned
# Module design
5. variables.tf: description documents required fields, type constraints, and validation blocks
6. outputs.tf: sensitive = true marks sensitive outputs; not shown in plaintext in plan
7. lifecycle.prevent_destroy = true guards production critical resources (DB, certs)
8. moved blocks rename resources without destroy/create; import blocks onboard existing resources
# CI pipeline
9. Order: terraform fmt -check → validate → tflint → plan -out=tfplan
10. Plan summary (change count/destroy count) commented on PR; human approval required before apply
11. infracost breakdown estimates cost changes and auto-comments on PR
12. Apply must specify plan file (terraform apply tfplan); planless apply is forbidden
# Security & governance
13. Plaintext secrets must never appear in .tf files or plan output
14. sensitive = true variables stay out of plan text; sensitive state fields shown via CI secrets
15. state rm/mv requires a checklist and rollback path, with human approval
16. tfsec / checkov scans for security misconfigurations (e.g. public S3 buckets, unencrypted storage)