Code review checklist
Before merge, have the agent walk the diff across security, performance, and testability. This page chains author → reviewer → merge, adds a severity matrix, and shows a filterable comment demo.
Break review into checkboxes: injection & auth, sensitive data, error handling, edge cases, test coverage, breaking-change notes. The agent can tune priority by file type and diff size; in the SKILL, spell out hard “block merge” rules vs gray areas that need a human.
For large PRs, emit a change summary and risk tier before the full checklist; for docs-only or config-only changes, shrink security checks and keep formatting and link checks. Stay aligned with CI so the checklist matches real pipeline gates.
Flow (author → reviewer → merge)
Clear boundaries keep the agent from mixing “tests the author must add” with “architecture the reviewer must approve.” The flow below fits human collaboration; if the agent acts as reviewer, label the role (author / reviewer / merge-gate).
[ Author: self-test, description, minimal reviewable diff ]
│
▼
[ Reviewer: read description → read diff → tag P0/P1/P2 ]
│
┌─────────┴─────────┐
▼ ▼
[ Blocking: must fix ] [ Non-blocking: before merge ]
│ │
└─────────┬─────────┘
▼
[ Merge: CI green, branch protection, (optional) second approval ]
│
▼
[ Post-merge: monitor / rollback plan ]
Expand: author deliverables
Self-checklist, PR description with intent and risk, linked issues, screenshots or API contract deltas; for huge changes, split PRs or add a “reading map.”
Copy-ready PR description template:
## Motivation
<!-- Why this PR? Which bug or feature does it address? Related issue: #xxx -->
## Impact scope
<!-- Which modules/APIs/schemas changed? Any breaking changes?
Dependents: does frontend / mobile / other services need to upgrade? -->
## Test plan
- [ ] Unit tests (which cases were added/modified)
- [ ] Integration/E2E (optional, describe scenarios covered)
- [ ] Manual verification steps (screenshots/recordings can be attached)
## Rollback steps
<!-- How to quickly roll back if something goes wrong? Feature flag name? DB column safe to ignore? -->
1. Revert commit: `git revert <sha>`
2. If DB migration involved: run `make db-rollback`
## Screenshots / Trace (optional)
<!-- Before / After screenshots, or API request/response comparison -->
Expand: reviewer focus
Read description before code; each comment has severity and an actionable outcome; P0 must block merge; mark speculative feedback as nit or P2 so real security issues stay visible.
Expand: merge gate
Branch protection, required reviewers, status checks (CI, signed commits), and team rule “no open P0s”; before the agent merges, restate whether these are satisfied.
Severity matrix & definitions
Teams can tune SLAs; P0/P1 meaning must stay consistent across the repo.
| Level | Typical meaning | Merge policy | Expected handling (example) |
|---|---|---|---|
| P0 | Security/compliance/data loss/production crash | Do not merge; fix or revert | Same day or hotfix window |
| P1 | Clear logic bug, missing critical tests, error-handling gap | Block by default; rare exceptions with tracking | Current iteration |
| P2 | Style, naming, small polish, doc typos | Non-blocking; follow-up OK | Backlog or good first issue |
- P0
- Would ship exploit, privilege bypass, plaintext secrets, or a deterministic crash path.
- P1
- Hurts correctness, observability, or rollback; missing tests on critical branches.
- P2
- Readability and consistency; suggestions that don’t affect correctness or security.
10 red-line items that must block merge
🔴 1. Hard-coded credentials
const apiKey = "sk-prod-abc123"; // ← plaintext secret in codebase
🔴 2. Auth bypass (route/middleware commented out or always-true condition)
// router.use(authMiddleware); // ← auth middleware commented out
🔴 3. User input concatenated directly into SQL
`SELECT * FROM users WHERE id = ${req.params.id}` // SQL injection
🔴 4. Deserializable entry point with no type/schema validation
const body = JSON.parse(rawInput); // used without Zod/Joi/yup validation
🔴 5. Sensitive fields logged in plaintext
logger.info("User login", { password, token }); // password/token in logs
🔴 6. IDOR: resource ID from client without ownership check
const order = await Order.findById(req.body.orderId); // missing .userId check
🔴 7. Critical tests deleted or skipped (coverage gamed artificially)
it.skip("concurrent refund test", ...) // ← skipped to keep coverage green
🔴 8. DB column dropped without expand-contract migration (direct DROP COLUMN)
ALTER TABLE orders DROP COLUMN legacy_status; // no rollback migration path
🔴 9. Unbounded recursion or infinite loop in main code path
function retry(fn) { return retry(fn); } // ← no termination condition
🔴 10. Production env flag forced to debug/development mode
NODE_ENV=development # hard-coded in prod Dockerfile or k8s configmap
Filter comments by P0 / P1 (demo)
Static sample list below—use chips to filter; contrast with orchestration checklists, this demo focuses on structured comment grading.
Dimension checklist
The agent can trim by diff type; map rows to automation or a quick human review.
| Dimension | What to check | When it fails |
|---|---|---|
| Security | Auth, injection, sensitive logs, dependency CVEs, secrets on disk | P0 / P1 |
| Correctness | Edge cases, concurrency, idempotency, contract/migration alignment | P0 / P1 |
| Performance | Hot paths, N+1, large payloads, unbounded caches | Usually P1; extreme P0 |
| Testability | Critical branches covered, mock boundaries, flake risk | Usually P1 |
| Operability | Log fields, metrics, feature flags, rollback path | P1 / P2 |
SKILL snippet
Encode stages and severity in the SKILL so the model outputs structured review results.
---
name: code-review-checklist
description: Layered security and quality review of PR diffs before merge; output structured comments
---
# Role declaration (clarify current role before each review)
current_role: reviewer # options: author | reviewer | merge-gate
# Step 1: read PR description (2 min)
- Confirm motivation is clear; if no description, ask author to fill it before proceeding
- Check whether impact scope is noted (breaking changes / DB migrations / API changes)
- Confirm rollback steps exist and are actionable
# Step 2: security scan (hard block red lines)
- [ ] No hard-coded credentials/secrets/tokens (grep: `sk-`, `password =`, `secret =`)
- [ ] User input not directly concatenated into SQL / shell / HTML (check all `${req.*}` injection points)
- [ ] Auth middleware not commented out / bypassed (check route layer)
- [ ] No sensitive fields in logs (password, token, ssn, card_number)
- [ ] IDOR: resource IDs must pass ownership checks
# Step 3: correctness (P0/P1)
- [ ] Concurrency paths: any race conditions (non-atomic check-then-act)?
- [ ] Boundaries: null / zero / negative / oversized input handled?
- [ ] Consistent with API contract, DB schema, and message format?
# Step 4: testability (P1)
- [ ] New happy path has corresponding unit test
- [ ] Error branches (network/DB timeout, business exception) have tests
- [ ] Critical integration path has integration test or contract test
# Step 5: operability (P1/P2)
- [ ] New endpoints/services have logs and metrics
- [ ] New features have a feature flag (if gradual rollout needed)
- [ ] Failure is rollback-safe (no irreversible side effects)
# Output format (required)
For each comment output:
severity: P0 | P1 | P2
file: relative path
line: L42 (if applicable)
issue: one-line problem description
suggestion: actionable fix direction or code snippet
New endpoint does not validate session and trusts user id from the client—use server session or signed verification.
Refund branch lacks integration tests; add a case for partial refund + concurrency to avoid regressions.
Env table missing new
FEATURE_X; can follow in a docs PR after merge.Error path only logs to console; no unified error code—doesn’t match existing API conventions.
User ID concatenated directly into SQL string:
WHERE id = ${req.params.id}— must be rewritten as a parameterized queryWHERE id = $1with a bound parameter. Current code is vulnerable to1; DROP TABLE users--injection.processRefund()swallowsStripeErrorin the catch block and returns onlyfalse; callers cannot distinguish “network timeout” from “insufficient balance,” and no metrics are reported. Map the error type to a business error code or re-throw it.formatCurrencyhard-codes the'en-US'locale; consider accepting locale from the caller for multi-language reuse. Non-blocking; track as a follow-up issue.