Category · Agent engineering

Agent engineering

5 skills Category 2 of 20

This category turns “one completion” into a runnable system: how multi-step work is split and handed off, how tool APIs are testable, how benchmarks and regression constrain behavior, how guardrails reduce privilege abuse and harmful output, and how image/audio/multimodal payloads enter the same orchestration. With prompt engineering, prompts set intent and format; this category sets structure and governance.

In the hub it follows prompts; combine with testing and security for a full loop. The five entries match the main hub.

Quick links

Agent orchestration

Multi-step tasks and handoffs.

Tool use design

Schemas, idempotency, errors.

Eval harness

Cases and regression metrics.

Guardrails

Refusal and privilege boundaries.

Multimodal input

Image, audio, structured payloads.

In depth

Agent orchestration

Define how Planner, Worker, Reviewer, and similar roles hand off state, where state lives, and which step retries on failure—align with branching and CI so orchestration works in pipelines, not only on paper.

Tool use design

Per tool: JSON-schema-level parameters, idempotency keys, timeouts, and parseable errors so the model can retry intelligently. Cheapest to maintain when aligned with MCP and internal API gateways.

Eval harness

Fixed case sets and metrics (accuracy, tool-call correctness, regression diffs) to prove iterations help; wire golden sets into CI or release gates instead of relying on vibes.

Guardrails

Policies before/after the model: confirm risky actions, detect exfiltration, refuse jailbreaks and unauthorized tool calls—align with identity, audit logs, and compliance requirements.

Multimodal input

Normalize screenshots, audio, and diagrams into structured payloads and metadata (source, time, sensitivity) before orchestration; mind privacy and storage so multimodal does not become a new leak surface.

Back to more skills Skill library (filters)