Security audit checklist

Walk agents through pre-release or architecture reviews: a unified checklist for auth, input validation, supply chain, and operations.

This page provides: per-item OWASP Top 10 check commands, STRIDE threat modeling questions across 6 dimensions, vulnerable-vs-secure code comparisons, semgrep/bandit/trivy CLI commands, and a ready-to-use audit report JSON template. Every item is verifiable—no vague language.

In the SKILL, mark each item pass / gap / not applicable with evidence location (file path / commit / config line).
Security exceptions must be approved with a re-review date; dependencies and images must be traceable to their pipeline build.

OWASP Top 10 checks and STRIDE threat modeling

Each OWASP check item includes a corresponding tool command or code pattern; each STRIDE dimension has two concrete verification questions.

OWASP Top 10 check commands

# A01 Broken Access Control — check for missing authorization decorators
semgrep --config=p/owasp-top-ten src/

# A02 Cryptographic Failures — detect weak algorithms
bandit -r . -t B303,B304,B305,B413

# A03 Injection — SQL string concatenation
semgrep --pattern 'db.query("..." + $X)' --lang python src/

# A05 Security Misconfiguration — container image vulnerabilities
trivy image --severity HIGH,CRITICAL myapp:latest

# A06 Vulnerable Components — dependency scan
trivy fs --security-checks vuln .

# A07 Authentication Failures — weak password hashing
bandit -r . -t B303,B324

# A09 Insufficient Logging — missing audit log
grep -rn "except:" src/ | grep -v "log\."

Vulnerable vs Secure code comparison

// ❌ Dangerous: string-concatenated SQL (A03 Injection)
const sql = `SELECT * FROM users WHERE id = ${req.params.id}`;
db.query(sql);

// ✅ Secure: parameterized query
db.query('SELECT * FROM users WHERE id = $1', [req.params.id]);

// ❌ Dangerous: MD5 password storage (A02 Cryptographic Failures)
const hash = crypto.createHash('md5').update(password).digest('hex');

// ✅ Secure: bcrypt with cost factor ≥ 12
const hash = await bcrypt.hash(password, 12);

STRIDE threat modeling — 6 dimensions

Spoofing: Do service-to-service calls use mTLS or signed JWT? Are password-reset links bound to a one-time token?
Tampering: Do DB writes verify resource ownership? Are webhook payloads verified with HMAC-SHA256?
Repudiation: Do critical operations have immutable audit logs with user IDs? Are audit logs tamper-proof (no delete after write)?
Information Disclosure: Do error responses expose stack traces or internal paths? Do list endpoints restrict results to authorized records?
Denial of Service: Do expensive endpoints have rate limiting and timeouts? Do file uploads enforce size and type limits?
Elevation of Privilege: IDOR: does the backend verify IDs cannot access other users' data? Are admin functions enforced at the DB layer with dedicated roles?

If an area is not deployed or not in this change, mark not applicable with a one-line rationale; do not confuse “not reviewed” with “not applicable.”

Finding severity and scan tool commands

Severity ties to release blocking and SLAs; keep definitions consistent within a repo and map them to OWASP or an internal risk matrix in the report.

# semgrep: custom rule scanning (can run in CI)
semgrep --config=p/security-audit \
        --config=p/secrets \
        --json -o report.json src/

# bandit: Python source code security scan
bandit -r src/ -f json -o bandit-report.json \
  -ll  # report medium and above only

# trivy: container image and filesystem scan
trivy image --severity HIGH,CRITICAL \
            --format json \
            --output trivy-report.json \
            myapp:$(git rev-parse --short HEAD)

# trivy: dependency vulnerabilities (sbom mode)
trivy fs --security-checks vuln,secret \
         --format cyclonedx \
         --output sbom.json .

Level	Meaning (examples)	Expected handling
`Critical`	Remote unauthenticated execution, large-scale data exposure, production credentials exposed in plaintext and readily exploitable.	Block release or hotfix; mitigation path and owner within 24h.
`High`	Clear authz bypass, reliably reproducible injection or SSRF, weak crypto on sensitive data.	Fix in the current train; exceptions need written approval and expiry.
`Medium`	Exploit needs narrow conditions or lower likelihood, missing defense-in-depth, logs leaking helper details.	Schedule in iteration; prioritize with product risk.
`Low` / `Info`	Hardening, stale guidance in docs/comments, improvements weakly tied to the threat model.	Backlog; optional before merge.

Audit execution flow

  [ Confirm scope + threat-model entry ]
        │
        ▼
  ┌─────────────┐     Checklist: auth / input / data / integrations / ops
  │ Walk items  │──── Each row: status, evidence (path/config/commit), notes
  └─────────────┘
        │
        ▼
  ┌─────────────┐     Map severity; Critical/High mark blockers + SLA
  │ Roll up     │──── Owners: app / platform / compliance; link Finding IDs
  │  findings   │
  └─────────────┘
        │
        ▼
  ┌─────────────┐     Track to closure; log exception review dates
  │ Retest &    │──── Deliverables: audit summary + open findings
  │  closeout   │
  └─────────────┘

Audit report JSON template (Finding structure)

{
  "finding_id": "VV-SEC-API-0001",
  "severity": "High",
  "title": "GraphQL endpoint missing depth limit, vulnerable to DoS",
  "cwe": "CWE-400",
  "owasp": "A05:2021-Security Misconfiguration",
  "description": "POST /graphql accepts unbounded query nesting depth; an attacker can craft a 100-level deep query to exhaust CPU. No depthLimit set at src/graphql/server.ts:42.",
  "reproduction": "curl -X POST /graphql -d '{users{posts{comments{author{posts{...}}}}}}' triggers 100% CPU.",
  "recommendation": "Add graphql-depth-limit with maxDepth=7; also add query cost analysis.",
  "evidence": "src/graphql/server.ts:42, load test recording ./evidence/gql-dos.mp4",
  "responsible_domain": "application",
  "estimated_effort": "2h",
  "sla_days": 14,
  "status": "open"
}

Agent outputs should restate scope and assumptions before findings; each item needs ID, severity, repro or evidence, and remediation—avoid vague “be more secure” notes.

Finding ID format

Stable IDs ease references across PRs, tickets, and retests. Format {PREFIX}-SEC-{AREA}-{4-digit-seq} (prefix and area are uppercase letters and digits only).

Asset prefix

Area

Sequence

Prefix uppercases and strips non-alphanumeric characters; sequence clamps to 1–9999. Match whatever example you embed in the SKILL body.

---
name: security-audit
description: Audit app security design and implementation per OWASP Top 10 and STRIDE, producing actionable findings
---
# Steps
1. Confirm audit scope: list in-scope and explicitly excluded boundaries in the SKILL or ticket
2. Run automated scans: semgrep / bandit / trivy, generate JSON reports
3. Walk OWASP Top 10: for each item record pass / gap / not applicable with evidence (path/commit)
4. STRIDE threat modeling: for each of 6 dimensions, raise ≥2 specific verification questions
5. Code-level comparison: vulnerable vs secure, annotate CWE numbers
6. Fill Finding JSON: finding_id / severity / title / cwe / description / recommendation
7. Assign responsible domain (application / platform / compliance) and estimated effort
8. Critical/High: block release, provide mitigation path and owner within 24h
9. Medium: schedule in iteration; exceptions require written approval with expiry date
10. Retest: after fixes, re-run tool scans, close Finding and update status
11. Generate audit summary: scope + tool versions + list of open findings
12. Log security exceptions: attach ticket, approver, and re-review date

# Anti-patterns
- Do NOT write "not applicable" for items that were simply not checked
- Do NOT use vague language like "recommend improving security"
- Do NOT auto-output "audit passed" conclusion without evidence

Back to skills More skills