Code review checklist

Before merge, have the agent walk the diff across security, performance, and testability. This page chains author → reviewer → merge, adds a severity matrix, and shows a filterable comment demo.

Break review into checkboxes: injection & auth, sensitive data, error handling, edge cases, test coverage, breaking-change notes. The agent can tune priority by file type and diff size; in the SKILL, spell out hard “block merge” rules vs gray areas that need a human.

For large PRs, emit a change summary and risk tier before the full checklist; for docs-only or config-only changes, shrink security checks and keep formatting and link checks. Stay aligned with CI so the checklist matches real pipeline gates.

Flow (author → reviewer → merge)

Clear boundaries keep the agent from mixing “tests the author must add” with “architecture the reviewer must approve.” The flow below fits human collaboration; if the agent acts as reviewer, label the role (author / reviewer / merge-gate).

  [ Author: self-test, description, minimal reviewable diff ]
                    │
                    ▼
  [ Reviewer: read description → read diff → tag P0/P1/P2 ]
                    │
          ┌─────────┴─────────┐
          ▼                   ▼
   [ Blocking: must fix ]   [ Non-blocking: before merge ]
          │                   │
          └─────────┬─────────┘
                    ▼
  [ Merge: CI green, branch protection, (optional) second approval ]
                    │
                    ▼
            [ Post-merge: monitor / rollback plan ]
Expand: author deliverables

Self-checklist, PR description with intent and risk, linked issues, screenshots or API contract deltas; for huge changes, split PRs or add a “reading map.”

Copy-ready PR description template:

## Motivation
<!-- Why this PR? Which bug or feature does it address? Related issue: #xxx -->

## Impact scope
<!-- Which modules/APIs/schemas changed? Any breaking changes?
     Dependents: does frontend / mobile / other services need to upgrade? -->

## Test plan
- [ ] Unit tests (which cases were added/modified)
- [ ] Integration/E2E (optional, describe scenarios covered)
- [ ] Manual verification steps (screenshots/recordings can be attached)

## Rollback steps
<!-- How to quickly roll back if something goes wrong? Feature flag name? DB column safe to ignore? -->
1. Revert commit: `git revert <sha>`
2. If DB migration involved: run `make db-rollback`

## Screenshots / Trace (optional)
<!-- Before / After screenshots, or API request/response comparison -->
Expand: reviewer focus

Read description before code; each comment has severity and an actionable outcome; P0 must block merge; mark speculative feedback as nit or P2 so real security issues stay visible.

Expand: merge gate

Branch protection, required reviewers, status checks (CI, signed commits), and team rule “no open P0s”; before the agent merges, restate whether these are satisfied.

Severity matrix & definitions

Teams can tune SLAs; P0/P1 meaning must stay consistent across the repo.

Level Typical meaning Merge policy Expected handling (example)
P0 Security/compliance/data loss/production crash Do not merge; fix or revert Same day or hotfix window
P1 Clear logic bug, missing critical tests, error-handling gap Block by default; rare exceptions with tracking Current iteration
P2 Style, naming, small polish, doc typos Non-blocking; follow-up OK Backlog or good first issue
Severity vs merge policy (example—replace with your team policy).
P0
Would ship exploit, privilege bypass, plaintext secrets, or a deterministic crash path.
P1
Hurts correctness, observability, or rollback; missing tests on critical branches.
P2
Readability and consistency; suggestions that don’t affect correctness or security.
Hard vs gray: hard examples include plaintext secrets, removed auth, unvalidated serialization boundaries; gray examples include micro-optimizations on hot paths or copy needing product sign-off—in the SKILL, the agent must label gray items “needs human confirmation,” not P0.

10 red-line items that must block merge

🔴 1. Hard-coded credentials
   const apiKey = "sk-prod-abc123";   // ← plaintext secret in codebase

🔴 2. Auth bypass (route/middleware commented out or always-true condition)
   // router.use(authMiddleware);      // ← auth middleware commented out

🔴 3. User input concatenated directly into SQL
   `SELECT * FROM users WHERE id = ${req.params.id}`  // SQL injection

🔴 4. Deserializable entry point with no type/schema validation
   const body = JSON.parse(rawInput);  // used without Zod/Joi/yup validation

🔴 5. Sensitive fields logged in plaintext
   logger.info("User login", { password, token });  // password/token in logs

🔴 6. IDOR: resource ID from client without ownership check
   const order = await Order.findById(req.body.orderId);  // missing .userId check

🔴 7. Critical tests deleted or skipped (coverage gamed artificially)
   it.skip("concurrent refund test", ...)  // ← skipped to keep coverage green

🔴 8. DB column dropped without expand-contract migration (direct DROP COLUMN)
   ALTER TABLE orders DROP COLUMN legacy_status;  // no rollback migration path

🔴 9. Unbounded recursion or infinite loop in main code path
   function retry(fn) { return retry(fn); }  // ← no termination condition

🔴 10. Production env flag forced to debug/development mode
    NODE_ENV=development  # hard-coded in prod Dockerfile or k8s configmap

Filter comments by P0 / P1 (demo)

Static sample list below—use chips to filter; contrast with orchestration checklists, this demo focuses on structured comment grading.

  • P0 auth/login.ts

    New endpoint does not validate session and trusts user id from the client—use server session or signed verification.

  • P1 services/order.ts

    Refund branch lacks integration tests; add a case for partial refund + concurrency to avoid regressions.

  • P2 README.md

    Env table missing new FEATURE_X; can follow in a docs PR after merge.

  • P1 api/handlers.ts

    Error path only logs to console; no unified error code—doesn’t match existing API conventions.

  • P0 db/queries.ts · line 42

    User ID concatenated directly into SQL string: WHERE id = ${req.params.id} — must be rewritten as a parameterized query WHERE id = $1 with a bound parameter. Current code is vulnerable to 1; DROP TABLE users-- injection.

  • P1 services/payment.ts · line 87

    processRefund() swallows StripeError in the catch block and returns only false; callers cannot distinguish “network timeout” from “insufficient balance,” and no metrics are reported. Map the error type to a business error code or re-throw it.

  • P2 utils/format.ts · line 15

    formatCurrency hard-codes the 'en-US' locale; consider accepting locale from the caller for multi-language reuse. Non-blocking; track as a follow-up issue.

Dimension checklist

The agent can trim by diff type; map rows to automation or a quick human review.

Dimension What to check When it fails
Security Auth, injection, sensitive logs, dependency CVEs, secrets on disk P0 / P1
Correctness Edge cases, concurrency, idempotency, contract/migration alignment P0 / P1
Performance Hot paths, N+1, large payloads, unbounded caches Usually P1; extreme P0
Testability Critical branches covered, mock boundaries, flake risk Usually P1
Operability Log fields, metrics, feature flags, rollback path P1 / P2

SKILL snippet

Encode stages and severity in the SKILL so the model outputs structured review results.

---
name: code-review-checklist
description: Layered security and quality review of PR diffs before merge; output structured comments
---

# Role declaration (clarify current role before each review)
current_role: reviewer   # options: author | reviewer | merge-gate

# Step 1: read PR description (2 min)
- Confirm motivation is clear; if no description, ask author to fill it before proceeding
- Check whether impact scope is noted (breaking changes / DB migrations / API changes)
- Confirm rollback steps exist and are actionable

# Step 2: security scan (hard block red lines)
- [ ] No hard-coded credentials/secrets/tokens (grep: `sk-`, `password =`, `secret =`)
- [ ] User input not directly concatenated into SQL / shell / HTML (check all `${req.*}` injection points)
- [ ] Auth middleware not commented out / bypassed (check route layer)
- [ ] No sensitive fields in logs (password, token, ssn, card_number)
- [ ] IDOR: resource IDs must pass ownership checks

# Step 3: correctness (P0/P1)
- [ ] Concurrency paths: any race conditions (non-atomic check-then-act)?
- [ ] Boundaries: null / zero / negative / oversized input handled?
- [ ] Consistent with API contract, DB schema, and message format?

# Step 4: testability (P1)
- [ ] New happy path has corresponding unit test
- [ ] Error branches (network/DB timeout, business exception) have tests
- [ ] Critical integration path has integration test or contract test

# Step 5: operability (P1/P2)
- [ ] New endpoints/services have logs and metrics
- [ ] New features have a feature flag (if gradual rollout needed)
- [ ] Failure is rollback-safe (no irreversible side effects)

# Output format (required)
For each comment output:
  severity: P0 | P1 | P2
  file: relative path
  line: L42 (if applicable)
  issue: one-line problem description
  suggestion: actionable fix direction or code snippet

Back to skills More skills