Event-driven design

Guides agents to separate commands from events and define publisher ownership; align “database committed—with “message visible—using a transactional outbox; document Saga orchestration/choreography, compensation, and human touchpoints. This page includes a collapsible table of contents and an event-card draft lab (evt-page script).

Name domain events in the past tense and carry the smallest useful payload; separate integration events from internal ones so aggregate internals do not leak to the global bus.

For cross-service consistency, state the eventual-consistency window and human touchpoints explicitly; each Saga step must be retryable, compensatable, or traceably abortable with a reason.

In a SKILL, require a consumer list and subscription versioning strategy so implicit coupling does not create deployment-order hell.

One-page summary

  • Commands change state; events state facts; the publisher owns integration-event schema and semantic versioning.
  • When visibility must align with a local transaction, use an outbox (or audited CDC) to avoid dual-write races.
  • Long-running work across boundaries uses a Saga: explicit state machine, compensation order, and message delivery semantics (at-least-once + idempotency) aligned.
// CloudEvents 1.0 canonical event format (JSON)
{
  "specversion": "1.0",
  "id": "b46d462a-4c6f-4c1e-9e8b-3f5d1c7a9b2c",
  "type": "com.acme.ordering.order.placed.v1",
  "source": "/ordering-service/orders",
  "subject": "order/88421",
  "time": "2024-03-15T10:30:00Z",
  "datacontenttype": "application/json",
  "dataschema": "https://schemas.acme.com/ordering/order-placed/v1.json",
  "traceparent": "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01",
  "data": {
    "orderId": "88421",
    "customerId": "cust-12345",
    "totalAmount": 99.50,
    "currency": "USD",
    "items": [{ "skuId": "sku-001", "qty": 2, "price": 49.75 }]
  }
}

// Event naming convention: {domain}.{aggregate}.{event}.{version}
// type: "com.acme.ordering.order.placed.v1"
//       ───────── ──────── ───── ─────── ──
//       org        domain  agg   event   version
//
// Partition key (Kafka / ordering guarantee): use aggregate ID from subject
// kafka_key = "order/88421"  → all events for the same order go to the same partition
// Event schema versioning: forward/backward compatibility strategy
// Backward-compatible (new consumer can read old events):
//   ✅ Add optional fields (with defaults)
//   ✅ Extend enum values
//   ❌ Remove fields / change field types / rename fields

// v1 event (old)
{ "orderId": "88421", "status": "placed" }

// v2 event (added optional field, backward-compatible)
{ "orderId": "88421", "status": "placed", "customerId": "cust-123" }

// Forward-compatible (old consumer can read new events, ignores unknown fields):
// Consumer must use @JsonIgnoreProperties(ignoreUnknown=true) or equivalent
// For breaking changes: publish a new type (.v2) and maintain dual-version consumer window

// Python example (pydantic forward compatibility)
from pydantic import BaseModel, ConfigDict
class OrderPlacedV1(BaseModel):
    model_config = ConfigDict(extra="ignore")   # ignore unknown fields
    orderId: str
    status: str
    customerId: str | None = None               # new optional field has a default

End-to-end flow (skill-flow-block)

  [ Command ingress: API / scheduler / integration adapter ]
                    │
                    ▼
         ┌──────────────────────│
         │ Aggregate txn: state │──── In-domain: domain events (in-memory collection)
         │ + outbox row (same   │     Integration payload trimmed at mapping layer
         │   transaction)       │
         └──────────────────────│
                    │
                    ▼
         ┌──────────────────────│
         │ Outbox dispatcher:   │──── Mark published / retry / DLQ;
         │ poll / CDC / log tail│     align with broker at-least-once
         └──────────────────────│
                    │
                    ▼
         ┌──────────────────────│
         │ Message bus: topic / │──── Ordering key, TTL, DLQ, schema registry
         │ partition key / group│
         └──────────────────────│
                    │
        ┌───────────┴───────────│
        ▼                      ▼
  [ Consumer: idempotent ]  [ Saga: orchestrator or choreography ]
        │                      │
        └───────────┬───────────│
                    ▼
              [ Observability: trace / business metrics / audit ]

If you skip the outbox, document in the SKILL the acceptable loss window or alternative (e.g. synchronous API + retry-storm risk); do not assume fire-and-forget delivery.

Transactional outbox

Within one database transaction, write business rows and an outbox table (or equivalent log entry); a separate process publishes in order so “business committed—and “externally visible message—move together monotonically.

  • Table shape: event type, payload version, aggregate id, correlation/causation ids, publish status and timestamps.
  • Dispatcher: single writer or partition lock to avoid duplicate publish; backoff and poison handling documented with DLQ policy.
  • vs CDC: outbox models message boundaries explicitly; CDC must handle change noise and projection lag—document SLIs in the SKILL.
-- Outbox table DDL (PostgreSQL)
CREATE TABLE outbox_events (
    id              UUID          PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type  VARCHAR(64)   NOT NULL,   -- e.g. 'Order'
    aggregate_id    VARCHAR(128)  NOT NULL,   -- e.g. '88421'
    event_type      VARCHAR(128)  NOT NULL,   -- e.g. 'com.acme.ordering.order.placed.v1'
    payload         JSONB         NOT NULL,
    correlation_id  UUID,                      -- trace causality chain
    causation_id    UUID,                      -- ID of upstream event that triggered this one
    status          VARCHAR(16)   NOT NULL DEFAULT 'pending',  -- pending/published/failed
    created_at      TIMESTAMPTZ   NOT NULL DEFAULT NOW(),
    published_at    TIMESTAMPTZ,
    retry_count     INT           NOT NULL DEFAULT 0
);

CREATE INDEX idx_outbox_pending ON outbox_events (created_at) WHERE status = 'pending';

-- Write business data + outbox in one transaction (Python + psycopg2)
def place_order(conn, order_data: dict) -> str:
    with conn.cursor() as cur:
        # 1. Write business data
        cur.execute("INSERT INTO orders (...) VALUES (...) RETURNING id", order_data)
        order_id = cur.fetchone()[0]
        # 2. Write outbox in same transaction
        cur.execute("""
            INSERT INTO outbox_events (aggregate_type, aggregate_id, event_type, payload)
            VALUES ('Order', %s, 'com.acme.ordering.order.placed.v1', %s)
        """, (str(order_id), json.dumps({"orderId": str(order_id), **order_data})))
    conn.commit()   # both committed atomically
    return order_id

Saga: orchestration, choreography, compensation

Orchestration: a central process issues commands and waits for completion events—simple mental model but coordinator availability is a risk. Choreography: services subscribe to each other’s events—decentralized but needs strong contracts and observability.

  • Per step define: input event, local side effect, success event, compensation or retry on failure, timeout and human ticket entry.
  • Compensation should be semantic (cancel booking, refund) rather than physical deletes; when not possible, document account freeze or ops workflow.
  • With outbox: Saga log and local transaction in the same transaction or clearly separated eventual-consistency segments to avoid “next step sent while prior step not persisted” ghost states.
## Saga pattern comparison

### Choreography — decentralized, each service subscribes to others' events
# OrderService publishes OrderPlaced
# InventoryService listens → deducts stock → publishes InventoryReserved
# PaymentService listens → charges → publishes PaymentProcessed
# Compensation: PaymentFailed → InventoryService listens → releases stock → publishes InventoryReleased

# Pros: no single point of failure, loosely coupled
# Cons: event chain is hard to trace, testing is complex, adding a service requires updating multiple subscriptions

### Orchestration — central coordinator explicitly controls each step
class OrderSaga:
    def __init__(self, order_id: str):
        self.order_id = order_id
        self.state = "STARTED"

    def execute(self):
        try:
            # Step 1: reserve inventory
            self.state = "RESERVING_INVENTORY"
            inventory_svc.reserve(self.order_id)

            # Step 2: charge payment
            self.state = "PROCESSING_PAYMENT"
            payment_svc.charge(self.order_id)

            # Step 3: ship order
            self.state = "SHIPPING"
            shipping_svc.ship(self.order_id)

            self.state = "COMPLETED"
        except InventoryError:
            self.compensate_inventory()
            self.state = "FAILED"
            raise
        except PaymentError:
            self.compensate_payment()
            self.compensate_inventory()  # compensation order is reverse of forward steps
            self.state = "FAILED"
            raise

    def compensate_inventory(self):
        # Semantic compensation: release reservation, not physical record deletion
        inventory_svc.release(self.order_id)

    def compensate_payment(self):
        payment_svc.refund(self.order_id)

# Pros: centralized flow, easy to trace, timeout and retry managed in one place
# Cons: coordinator is a single point of failure; needs high-availability deployment

Event contracts and consumers

  • Orchestration vs choreography: central coordinator SPOF vs decentralized trade-offs.
  • Ordering and duplicates: align with broker delivery semantics; idempotency keys and dedupe windows belong in the contract.
  • Evolution: forward/backward schema compatibility rules; deprecated fields and dual-write/read-new strategy.

SKILL front-matter example

---
name: event-driven-design
description: Map domain event flows, CloudEvents format, and cross-service Saga collaboration
---
# Event contract
Format: CloudEvents 1.0 (specversion/id/type/source/subject/time/data)
type naming: {org}.{domain}.{aggregate}.{event}.{version}
Partition key: aggregate ID in subject, guarantees same-aggregate event ordering
# Schema versioning
Backward-compatible: add optional fields (with defaults) / extend enums
Breaking change: publish new type (.v2), consumers window dual-version consumption
Consumer side: extra="ignore" (ignore unknown fields) ensures forward compatibility
# Outbox pattern
Table: id/aggregate_type/aggregate_id/event_type/payload/status/retry_count
Write business row + outbox in same transaction; dispatcher polls status=pending to publish
# Saga selection
Choreography: simple flow / loose coupling / fewer than 3 steps → each service subscribes to events
Orchestration: complex flow / unified tracing / timeout management → central coordinator
Compensation order: reverse of forward steps; semantic (refund/release) not physical deletes
# Output structure
Event card: name/type/source/payload fields/consumer list
Saga diagram: steps/success events/failure compensation/timeout entry points
Failure: retry (at-least-once + idempotency) / compensation / human ticket

Event card draft lab (evt-page JS)

For reviews or tickets: checkboxes append Outbox / Saga checklist lines automatically.

Pattern options

              

Lab output is for communication only; real contracts must match schema registry, consumer repos, and Saga state-machine definitions.

Back to skills More skills