Microservice boundaries

Use DDD bounded contexts as semantic boundaries aligned with data ownership, release cadence, and team collaboration; have agents produce reviewable service candidates, context-map drafts, and explicit “defer split—rationale—avoid distributed monoliths and splitting for its own sake.

A SKILL should require each candidate service to list: bounded context, owned aggregates and storage, exposed queries/commands/events, and collaboration with other contexts (including explicit choices such as ACL or shared kernel).

Two teams must not each “write—the same aggregate root without a clear process: either a single writer with read replicas or event projections for others, or admit a strong-consistency chain and assess whether splitting is premature.

DDD and bounded contexts

A bounded context is a boundary of semantic consistency: inside one context terminology and rules are unified (ubiquitous language); across boundaries the same name may mean different things—bridge with explicit translation (DTOs, ACL, integration contracts).

  • Start from business capabilities and subdomains (core / supporting / generic), then derive contexts—not by slicing technical layers (one service per controller tier).
  • Contexts outlive “microservices— services may merge or split, but domain boundary changes should trigger model and language review.
  • Agent output should cite sample ubiquitous-language terms per context and internal invariants that must not leak.
# Domain glossary conflict detection (Python script example)
# Same term with different meanings across contexts → boundary signal

GLOSSARIES = {
    "ordering": {
        "customer": "User placing an order, includes shipping address and payment method",
        "product":  "Product snapshot at order time, includes the price at purchase",
        "order":    "Purchase intent submitted by user, contains multiple OrderItems",
    },
    "inventory": {
        "product":  "Physical item in stock, includes SKU and warehouse location",
        "quantity": "Available stock count, excludes reserved units",
    },
    "billing": {
        "customer": "Financial account holder, includes invoice details",
        "invoice":  "Billing summary for a billing period",
    },
}

def detect_term_conflicts(glossaries: dict) -> list[dict]:
    """Detect semantic differences for the same term across contexts"""
    term_contexts: dict[str, list] = {}
    for ctx, terms in glossaries.items():
        for term in terms:
            term_contexts.setdefault(term, []).append(ctx)

    conflicts = []
    for term, contexts in term_contexts.items():
        if len(contexts) > 1:
            definitions = {c: glossaries[c][term] for c in contexts}
            conflicts.append({
                "term": term,
                "contexts": contexts,
                "definitions": definitions,
                "recommendation": f"'{term}' has different meanings across contexts → translate via ACL/DTO; do not share one model class"
            })
    return conflicts

conflicts = detect_term_conflicts(GLOSSARIES)
for c in conflicts:
    print(f"⚠ Conflicting term: {c['term']}")
    print(f"  Found in: {', '.join(c['contexts'])}")
    for ctx, defn in c['definitions'].items():
        print(f"  [{ctx}] {defn}")
    print(f"  Recommendation: {c['recommendation']}")

Mapping services to contexts

A common starting point is one bounded context per deployable, but it is not dogma: too small fragments ops and releases; too large tends toward a distributed monolith. Align on change frequency, team ownership, and transactional consistency needs—not package or folder layout.

Lean toward separate services

  • Different release cadences and scaling profiles (read-heavy vs batch peaks)
  • Clear data ownership and a single write path
  • Team can own end-to-end delivery inside the context

Lean toward same process / repo

  • Strongly consistent transactions across aggregates with no acceptable eventual window
  • High-frequency fine-grained synchronous chains (orchestrate first, BFF, or merge boundaries)
  • Unclear domain model with only a “split by table—impulse

Data ownership and aggregates

Tables or collections should belong to a single service; others consume via API, read replicas, CDC, or domain events—not direct DB access. Aggregates are transactional and invariant boundaries: entities changed in one transaction should hang off one aggregate root.

  • Cross-aggregate consistency is eventual (events, outbox, Saga); distributed transactions need explicit cost and alternatives in the SKILL.
  • Shared read-only dimensions can live in a generic subdomain or be replicated per context with a documented update source—avoid implicit shared writes.

Agent checklist: does each aggregate root have exactly one writing service? Have cross-service “updates to the same business fact—been replaced by events or clear master-data ownership?

Context mapping and collaboration

Context mapping captures upstream/downstream dependencies and power dynamics: who owns the model, who adapts, whether there is a shared kernel. Wrong patterns create hidden coupling or duplicate work; the SKILL should name the pattern and implementation (e.g. which service hosts the ACL, who owns contract tests).

  • Customer–Supplier: downstream needs influence upstream prioritization; good when an internal “productized—upstream model exists.
  • Conformist: downstream adopts the upstream model wholesale—cost is less freedom to evolve independently.
  • Anti-Corruption Layer (ACL): downstream isolates foreign models—good for external systems or legacy fat APIs.
  • Shared Kernel: small shared code/model surface with strict gates and versioning discipline.
  • Open Host Service (OHS) + Published Language (PL): upstream offers a stable integration surface and documented contract.
  • Separate Ways: when integration cost exceeds duplication, allow limited duplication with recorded rationale.
# Anti-Corruption Layer (ACL): translate external API response into internal domain model
# Scenario: Billing context calls external payment gateway; ACL prevents model leakage

from dataclasses import dataclass
from decimal import Decimal

# ===== External payment gateway response (upstream model, not under our control) =====
class StripePaymentIntent:
    def __init__(self, data: dict):
        self.id = data["id"]
        self.amount = data["amount"]          # Stripe: amount in cents
        self.currency = data["currency"]      # Stripe: lowercase "usd"
        self.status = data["status"]          # Stripe: "succeeded"/"requires_payment_method"
        self.customer = data.get("customer")  # Stripe: customer ID

# ===== Internal domain model (Billing BC) =====
@dataclass
class PaymentResult:
    payment_id: str
    amount: Decimal                           # Internal: dollars/euros
    currency: str                             # Internal: uppercase "USD"
    status: str                               # Internal: "SUCCESS"/"FAILED"/"PENDING"
    customer_id: str | None

# ===== Anti-Corruption Layer: translator =====
class StripePaymentACL:
    STATUS_MAP = {
        "succeeded": "SUCCESS",
        "requires_payment_method": "FAILED",
        "processing": "PENDING",
        "requires_action": "PENDING",
    }

    def translate(self, stripe_intent: StripePaymentIntent) -> PaymentResult:
        """Translate Stripe external model into Billing internal PaymentResult"""
        return PaymentResult(
            payment_id=stripe_intent.id,
            amount=Decimal(stripe_intent.amount) / 100,  # cents → dollars
            currency=stripe_intent.currency.upper(),      # usd → USD
            status=self.STATUS_MAP.get(stripe_intent.status, "UNKNOWN"),
            customer_id=stripe_intent.customer,
        )

# Usage: external model only appears in the ACL layer; never leaks into Billing internals
acl = StripePaymentACL()
external_data = {"id": "pi_abc", "amount": 9950, "currency": "usd",
                 "status": "succeeded", "customer": "cus_123"}
internal = acl.translate(StripePaymentIntent(external_data))
# → PaymentResult(payment_id='pi_abc', amount=Decimal('99.50'),
#                 currency='USD', status='SUCCESS', customer_id='cus_123')
# Pact contract test: consumer-driven contract test example (Python pact-python)
# Scenario: OrderService (consumer) calls InventoryService (provider)

# 1. Consumer side: define expectations (generates pact file)
from pact import Consumer, Provider

pact = Consumer("OrderService").has_pact_with(Provider("InventoryService"))

def test_reserve_inventory_consumer():
    (pact
     .given("SKU sku-001 has 100 units available")
     .upon_receiving("a reserve inventory request")
     .with_request("POST", "/inventory/reserve",
                   body={"skuId": "sku-001", "quantity": 2, "orderId": "order-88421"})
     .will_respond_with(200, body={
         "reservationId": pact.like("res-uuid-abc"),
         "skuId": "sku-001",
         "quantity": 2,
         "expiresAt": pact.like("2024-03-15T11:00:00Z"),
     }))

    with pact:
        # Actually calls consumer code; pact starts mock server
        result = inventory_client.reserve("sku-001", 2, "order-88421")
        assert result["quantity"] == 2

# 2. Provider side: verify pact file (runs in CI)
# pact-verifier --provider-base-url=http://localhost:8081 \
#               --pact-broker-url=https://pact.acme.com \
#               --provider=InventoryService \
#               --publish-verification-results \
#               --provider-version=$(git rev-parse HEAD)

From domain to service candidates (workflow)

  [ Event storming / domain narrative ]
        │
        ▼
  ┌─────────────────│    Output: verb commands, nouns, hot terms
  │Identify         │
  │subdomains       │
  └─────────────────│
        │
        ▼
  ┌─────────────────│    Each BC: language table, aggregate sketch,
  │Draw bounded     │    outward capabilities
  │contexts         │
  └─────────────────│
        │
        ▼
  ┌─────────────────│    Label: CS / ACL / OHS / Shared Kernel —
  │ Context map     │
  └─────────────────│
        │
        ▼
  ┌─────────────────│    Table: service | data owned | API/events | risk
  │Candidates &     │──── Defer: strong-consistency chains, batch coupling,
  │ deferrals       │    tech debt, integration hell
  └─────────────────┘

Deliverables should include: context list, map (text or Mermaid), service candidate table, and explicit “do not split—or “merge—items with triggers (e.g. call-depth threshold, org topology change).

Public API surface and sync chains

Each service should expose stable, versioned contracts (REST/GraphQL/gRPC/message contracts); implementation details do not cross the boundary. Deep synchronous stacks signal boundaries that are too fine or mis-assigned—prefer orchestration, async messaging, or BFF read aggregation.

  • Reads may compose via BFF or API gateway; writes keep a single owner to avoid dual writes.
  • Contract tests and consumer-driven contracts: document owners on the map to support independent deploys.

Anti-patterns and Conway’s law

Distributed monolith: many small services but deploys, releases, and schema changes stay tightly coupled—integration cost no better than a monolith. Layer-only services (pure DAO or pure gateway) often amplify fan-out and latency.

Conway’s law: system structure tends to mirror org communication. When boundaries and team topology diverge, either realign teams (feature teams to contexts) or admit intentional boundary breaches and record governance (owners, review bars) in the SKILL.

Defer-split checklist

A SKILL should list signals to defer splits or merge services, avoiding “microservices for microservices—sake.—

  • Cross-aggregate flows cannot accept eventual consistency in the business sense, and there is no reliable Saga/compensation design yet.
  • Batch, reporting, or migration jobs are tightly bound to online paths in the same transaction or lock granularity.
  • Two candidate services are owned by the same small team with daily integration and no independent release need.
  • Risky split without contract tests, feature flags, observability, or rollback strategy.
# Istio VirtualService traffic rule example (service mesh)
# Scenario: payment-service v2 canary release, 5% traffic to new version
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: payment-service
  namespace: production
spec:
  hosts:
    - payment-service
  http:
    # Canary: header-based routing (internal testers go to v2)
    - match:
        - headers:
            x-canary-user:
              exact: "true"
      route:
        - destination:
            host: payment-service
            subset: v2
    # Traffic split: 5% to v2, 95% to stable
    - route:
        - destination:
            host: payment-service
            subset: v1
          weight: 95
        - destination:
            host: payment-service
            subset: v2
          weight: 5
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: "5xx,connect-failure,reset"
      timeout: 10s
---
# Saga vs 2PC selection criteria:
# 2PC (two-phase commit):
#   Good for: same DB / XA-capable resource managers; acceptable latency
#   Cost: coordinator is single point of failure; long lock hold times; no cross-heterogeneous systems
# Saga:
#   Good for: cross-service / heterogeneous DBs; eventual consistency acceptable
#   Cost: requires compensation logic design; partial commit intermediate states possible
# Decision rule: cross-service → Saga; same DB strong consistency → 2PC or local transaction

Context map draft lab

Pick a classic mapping relationship, fill upstream/downstream context names and optional notes; use “Append row—repeatedly to accumulate text below for design docs or SKILL appendices.