Caching strategy

Layer browser, CDN, app, and distributed caches with keyspace, TTL, and active/passive invalidation—call out consistency costs explicitly. Includes a read-path overview and a key/TTL draft tool.

Start from freshness needs: strong consistency avoids cross-layer cache or uses short TTLs plus versioning; eventual consistency can use stale-while-revalidate, async refresh, and singleflight backfill.

The SKILL must list invalidation triggers: which keys to delete after writes, delayed double-delete if used, and when to force read-primary or reject stale versions on bus reordering/lag.

  • Keys include tenant, resource type, locale, schema version—no implicit global keys.
  • Hot keys: sharding, local secondary buffers, rate limits—cross-link breaker/limiter docs.

One-page summary

  • Layers: browser → CDN → in-process/app → Redis-class → DB; each layer states whether caching is allowed and max staleness.
  • Keys: predictable namespace + version segment; document write amplification and key-count caps.
  • Invalidation: write path maintains delete lists or version bumps; delayed double-delete and read-primary fallbacks when needed.
  • Risks: penetration (null cache/Bloom), breakdown (mutex/singleflight), avalanche (TTL jitter + warmup + degradation).
  • Metrics: hit rate, origin rate, latency percentiles, alerts for miss storms.
# Cache-Aside pattern (Python + Redis)
import json, time, random
import redis

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

def get_product(product_id: int) -> dict:
    key = f"v2:product:{product_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)           # Cache hit

    product = db_get_product(product_id)   # Cache miss -> fall back to DB
    if product:
        # Base TTL 300s + random jitter +-30s to prevent avalanche
        ttl = 300 + random.randint(-30, 30)
        r.setex(key, ttl, json.dumps(product))
    else:
        # Null cache to prevent penetration; short TTL (30s)
        r.setex(key, 30, json.dumps(None))
    return product

# Redis 5 data structures and typical cache use cases
# String:     serialize JSON objects (single entity cache)    r.setex("user:1", 300, json_str)
# Hash:       field-level updates (counters, config)          r.hset("session:abc", "uid", 42)
# List:       recent N records, message queue                 r.lpush("recent:views", prod_id); r.ltrim("recent:views", 0, 99)
# Set:        dedup tags, blocklist                           r.sadd("blacklist:ip", "1.2.3.4")
# Sorted Set: leaderboard, scored priority queue              r.zadd("leaderboard", {uid: score})

Layered read path (skill-flow-block)

  [ Request in: Accept-Language / tenant context ]
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
  [ Browser private cache ]   [ CDN / edge ]
  Cache-Control private      SWR, stale-if-error (if used)
        │                       │
        └───────────┬───────────┘
                    ▼
         [ In-process: LRU / local map ]
         Note: multi-instance—prefer short TTL or event invalidation
                    │
                    ▼
         [ Distributed: Redis / Memcached… ]
         Singleflight backfill, pipelining, serialization contract
                    │
           ┌────────┴────────┐
           ▼                 ▼
  [ Hit: return + backfill layers ]   [ Miss: origin DB / service ]
           │                 │
           └────────┬────────┘
                    ▼
         [ Optional: async write-back + staleness metadata ]

Call out which layers may be skipped: e.g. strongly consistent account balances rarely belong on CDN; read-mostly hot lists can pair long CDN TTL with short app TTL.

Layer roles & cache headers

Browser / CDN

  • Cache-Control: private / public, max-age, s-maxage.
  • ETag / Last-Modified with conditional requests to save bandwidth.
  • Authenticated or PII-heavy responses default private to avoid shared-cache leaks.

App / distributed

  • Serialization: JSON, MessagePack, compression; chunk large values or offload to object storage.
  • Connections/timeouts: do not let the cache tier stall the critical path; document behavior when breakers open.
  • Multi-tenant: key-prefix isolation; avoid production SCAN-style debugging.

Keyspace, TTL & jitter

Example shapes: {tenant}:{schemaVer}:{entity}:{id} or {svc}:{cacheVer}:{natural-key-hash}; never rely on an implicit default tenant in the SKILL.

  • TTL: tier by SLA; align with downstream refresh jobs to avoid perpetual staleness.
  • Jitter: add random spread (e.g. ±10%) around base TTL to avoid synchronized expiry.
  • Negative cache: short-lived placeholders for legitimately missing rows to cut penetration (mind auth boundaries).

Write path & invalidation list

After a successful write, delete related keys synchronously or asynchronously; “delete-then-write” widens read windows—mitigate with versions or delayed double-delete.

  • Maintain entity → affected-key sets (table or codegen) so list caches are not forgotten.
  • On bus lag: define read-primary after timeout, or explicit stale responses with X-Cache-Stale (product-aligned).
  • Bulk import/migration: namespace flush or version bump, with online traffic impact called out.

Penetration, breakdown, avalanche

  • Penetration: sparse/malicious keys hammer origins—combine null caching, Bloom filters, input validation, rate limits.
  • Breakdown: hot key expiry stampede—mutex, singleflight, “logical infinite” TTL + async refresh.
  • Avalanche: mass expiry or cluster outage—TTL jitter, warmup, tiered degradation, queued smoothing.
# Breakdown defense: mutex lock (Redis SETNX style)
import threading, time, json
import redis

r = redis.Redis(host="localhost", port=6379, decode_responses=True)
_local_locks: dict[str, threading.Lock] = {}

def get_with_mutex(key: str, loader, ttl: int = 300):
    """Singleflight: within one process, only one thread fetches origin for the same key"""
    cached = r.get(key)
    if cached is not None:
        return json.loads(cached)

    # Get or create local lock (dedup within same process)
    lock = _local_locks.setdefault(key, threading.Lock())
    with lock:
        # Double check: re-read after acquiring lock
        cached = r.get(key)
        if cached is not None:
            return json.loads(cached)

        value = loader()  # only one thread hits origin
        ttl_jitter = ttl + random.randint(-int(ttl * 0.1), int(ttl * 0.1))
        r.setex(key, ttl_jitter, json.dumps(value))
        return value

# Penetration defense: Redis Bloom Filter (Redis Stack BF commands)
def bloom_get_user(user_id: int) -> dict | None:
    # 1. Check bloom filter first (0.1% false positive, zero false negative)
    exists = r.execute_command("BF.EXISTS", "users:bloom", user_id)
    if not exists:
        return None          # definitely doesn't exist, skip DB

    # 2. Exists (may be false positive) -> proceed with normal cache logic
    key = f"user:{user_id}"
    cached = r.get(key)
    if cached:
        return json.loads(cached)

    user = db_get_user(user_id)
    if user:
        r.setex(key, 300, json.dumps(user))
    return user

# Write bloom filter when creating new users
def create_user(user_id: int, data: dict):
    db_insert_user(user_id, data)
    r.execute_command("BF.ADD", "users:bloom", user_id)
# Distributed cache consistency: update DB first then delete cache (recommended)
# Pattern A: update DB first, then delete cache (Cache-Aside + delayed double delete)
def update_product_a(product_id: int, data: dict):
    db_update_product(product_id, data)     # 1. update DB first
    cache_key = f"v2:product:{product_id}"
    r.delete(cache_key)                     # 2. delete cache (primary delete)
    # Delayed double-delete: delete again after 500ms to prevent stale re-population
    # during concurrent read-write window
    time.sleep(0.5)
    r.delete(cache_key)

# Pattern B: delete cache first then update DB (not recommended)
# Problem: concurrent reads between delete and update will re-populate stale value

# Trade-offs:
# - Pattern A: tiny window (between DB update and cache delete) may serve stale cache
# - Pattern A with double-delete: guards against replica-lag stale re-population
# - Stronger guarantee: async deletion via message queue or Binlog CDC (eventual consistency)

Consistency & degradation

For each API in the SKILL, state expected consistency: linearizable reads, causal, or eventual with a max staleness bound.

  • Across layers, lower-layer invalidation must propagate visibility upward, or absorb drift with very short TTLs.
  • Degradation: on cache failure, origin directly and tighten rate limits; no unbounded retries into the database.

When “finance-grade” consistency meets homepage-scale traffic, split paths: critical reads/writes on a short primary path; presentation data on tolerably stale caches.

Observability & alerts

  • Metrics: hit rate, miss QPS, origin rate, cache op latency (p95/p99), memory/connections.
  • Tracing: tag hit/miss and redacted key prefixes to debug bad invalidation or hotspots.
  • Alerts: sudden hit-rate drops, per-key QPS spikes, origin timeouts, cache error rates tied to runbooks.

Key & TTL draft tool

Fill the fields to draft layered cache keys and illustrative TTL bands per freshness tier (illustrative—align with your team before shipping). Jitter uses a uniform spread example.

Freshness tier

              

Segments are slugged to alphanumerics plus _ - .; empty segments become unknown. Add region, locale, etc. in real deployments.

---
name: caching-strategy
description: Design layered caches and invalidation for read/write paths
tags: [caching, redis, performance, consistency]
---
# Layered Strategy
- Browser: Cache-Control private/public + max-age; force private for PII responses
- CDN: public + s-maxage + stale-while-revalidate; authenticated content bypasses CDN
- App process: short TTL LRU (~60s); multi-instance consistency via natural TTL expiry
- Redis: Cache-Aside pattern; TTL = base + random jitter (+-10%); null cache 30s for penetration

# Redis Data Structure Selection
- String: single-object JSON serialization (user info, product details)
- Hash: session fields, config (supports single-field HSET updates)
- List: recent N records (LPUSH + LTRIM to maintain fixed length)
- Set: dedup tags, blocklist (O(1) SISMEMBER)
- Sorted Set: leaderboard, scored priority queue (ZADD + ZRANGEBYSCORE)

# Defense Patterns (code level)
- Breakdown: threading.Lock + Double Check -> only one thread hits origin within process
- Penetration: BF.EXISTS (Redis Bloom Filter) -> return None immediately for definite misses
- Avalanche: TTL = base + random.randint(-base*0.1, base*0.1)

# Write-Path Consistency
- Recommended: update DB first -> delete cache -> delayed delete after 500ms (guard replica lag)
- Stronger: message queue / Binlog CDC drives async delete + eventual consistency fallback
- Forbidden: two services writing the same cache key without coordination

All skills More skills