Caching strategy
Layer browser, CDN, app, and distributed caches with keyspace, TTL, and active/passive invalidation—call out consistency costs explicitly. Includes a read-path overview and a key/TTL draft tool.
Start from freshness needs: strong consistency avoids cross-layer cache or uses short TTLs plus versioning; eventual consistency can use stale-while-revalidate, async refresh, and singleflight backfill.
The SKILL must list invalidation triggers: which keys to delete after writes, delayed double-delete if used, and when to force read-primary or reject stale versions on bus reordering/lag.
- Keys include tenant, resource type, locale, schema version—no implicit global keys.
- Hot keys: sharding, local secondary buffers, rate limits—cross-link breaker/limiter docs.
One-page summary
- Layers: browser → CDN → in-process/app → Redis-class → DB; each layer states whether caching is allowed and max staleness.
- Keys: predictable namespace + version segment; document write amplification and key-count caps.
- Invalidation: write path maintains delete lists or version bumps; delayed double-delete and read-primary fallbacks when needed.
- Risks: penetration (null cache/Bloom), breakdown (mutex/singleflight), avalanche (TTL jitter + warmup + degradation).
- Metrics: hit rate, origin rate, latency percentiles, alerts for miss storms.
# Cache-Aside pattern (Python + Redis)
import json, time, random
import redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
def get_product(product_id: int) -> dict:
key = f"v2:product:{product_id}"
cached = r.get(key)
if cached:
return json.loads(cached) # Cache hit
product = db_get_product(product_id) # Cache miss -> fall back to DB
if product:
# Base TTL 300s + random jitter +-30s to prevent avalanche
ttl = 300 + random.randint(-30, 30)
r.setex(key, ttl, json.dumps(product))
else:
# Null cache to prevent penetration; short TTL (30s)
r.setex(key, 30, json.dumps(None))
return product
# Redis 5 data structures and typical cache use cases
# String: serialize JSON objects (single entity cache) r.setex("user:1", 300, json_str)
# Hash: field-level updates (counters, config) r.hset("session:abc", "uid", 42)
# List: recent N records, message queue r.lpush("recent:views", prod_id); r.ltrim("recent:views", 0, 99)
# Set: dedup tags, blocklist r.sadd("blacklist:ip", "1.2.3.4")
# Sorted Set: leaderboard, scored priority queue r.zadd("leaderboard", {uid: score})
Layered read path (skill-flow-block)
[ Request in: Accept-Language / tenant context ]
│
┌───────────┴───────────┐
▼ ▼
[ Browser private cache ] [ CDN / edge ]
Cache-Control private SWR, stale-if-error (if used)
│ │
└───────────┬───────────┘
▼
[ In-process: LRU / local map ]
Note: multi-instance—prefer short TTL or event invalidation
│
▼
[ Distributed: Redis / Memcached… ]
Singleflight backfill, pipelining, serialization contract
│
┌────────┴────────┐
▼ ▼
[ Hit: return + backfill layers ] [ Miss: origin DB / service ]
│ │
└────────┬────────┘
▼
[ Optional: async write-back + staleness metadata ]
Call out which layers may be skipped: e.g. strongly consistent account balances rarely belong on CDN; read-mostly hot lists can pair long CDN TTL with short app TTL.
Layer roles & cache headers
Browser / CDN
Cache-Control:private/public,max-age,s-maxage.ETag/Last-Modifiedwith conditional requests to save bandwidth.- Authenticated or PII-heavy responses default
privateto avoid shared-cache leaks.
App / distributed
- Serialization: JSON, MessagePack, compression; chunk large values or offload to object storage.
- Connections/timeouts: do not let the cache tier stall the critical path; document behavior when breakers open.
- Multi-tenant: key-prefix isolation; avoid production SCAN-style debugging.
Keyspace, TTL & jitter
Example shapes: {tenant}:{schemaVer}:{entity}:{id} or {svc}:{cacheVer}:{natural-key-hash}; never rely on an implicit default tenant in the SKILL.
- TTL: tier by SLA; align with downstream refresh jobs to avoid perpetual staleness.
- Jitter: add random spread (e.g. ±10%) around base TTL to avoid synchronized expiry.
- Negative cache: short-lived placeholders for legitimately missing rows to cut penetration (mind auth boundaries).
Write path & invalidation list
After a successful write, delete related keys synchronously or asynchronously; “delete-then-write” widens read windows—mitigate with versions or delayed double-delete.
- Maintain entity → affected-key sets (table or codegen) so list caches are not forgotten.
- On bus lag: define read-primary after timeout, or explicit stale responses with
X-Cache-Stale(product-aligned). - Bulk import/migration: namespace flush or version bump, with online traffic impact called out.
Penetration, breakdown, avalanche
- Penetration: sparse/malicious keys hammer origins—combine null caching, Bloom filters, input validation, rate limits.
- Breakdown: hot key expiry stampede—mutex, singleflight, “logical infinite” TTL + async refresh.
- Avalanche: mass expiry or cluster outage—TTL jitter, warmup, tiered degradation, queued smoothing.
# Breakdown defense: mutex lock (Redis SETNX style)
import threading, time, json
import redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
_local_locks: dict[str, threading.Lock] = {}
def get_with_mutex(key: str, loader, ttl: int = 300):
"""Singleflight: within one process, only one thread fetches origin for the same key"""
cached = r.get(key)
if cached is not None:
return json.loads(cached)
# Get or create local lock (dedup within same process)
lock = _local_locks.setdefault(key, threading.Lock())
with lock:
# Double check: re-read after acquiring lock
cached = r.get(key)
if cached is not None:
return json.loads(cached)
value = loader() # only one thread hits origin
ttl_jitter = ttl + random.randint(-int(ttl * 0.1), int(ttl * 0.1))
r.setex(key, ttl_jitter, json.dumps(value))
return value
# Penetration defense: Redis Bloom Filter (Redis Stack BF commands)
def bloom_get_user(user_id: int) -> dict | None:
# 1. Check bloom filter first (0.1% false positive, zero false negative)
exists = r.execute_command("BF.EXISTS", "users:bloom", user_id)
if not exists:
return None # definitely doesn't exist, skip DB
# 2. Exists (may be false positive) -> proceed with normal cache logic
key = f"user:{user_id}"
cached = r.get(key)
if cached:
return json.loads(cached)
user = db_get_user(user_id)
if user:
r.setex(key, 300, json.dumps(user))
return user
# Write bloom filter when creating new users
def create_user(user_id: int, data: dict):
db_insert_user(user_id, data)
r.execute_command("BF.ADD", "users:bloom", user_id)
# Distributed cache consistency: update DB first then delete cache (recommended)
# Pattern A: update DB first, then delete cache (Cache-Aside + delayed double delete)
def update_product_a(product_id: int, data: dict):
db_update_product(product_id, data) # 1. update DB first
cache_key = f"v2:product:{product_id}"
r.delete(cache_key) # 2. delete cache (primary delete)
# Delayed double-delete: delete again after 500ms to prevent stale re-population
# during concurrent read-write window
time.sleep(0.5)
r.delete(cache_key)
# Pattern B: delete cache first then update DB (not recommended)
# Problem: concurrent reads between delete and update will re-populate stale value
# Trade-offs:
# - Pattern A: tiny window (between DB update and cache delete) may serve stale cache
# - Pattern A with double-delete: guards against replica-lag stale re-population
# - Stronger guarantee: async deletion via message queue or Binlog CDC (eventual consistency)
Consistency & degradation
For each API in the SKILL, state expected consistency: linearizable reads, causal, or eventual with a max staleness bound.
- Across layers, lower-layer invalidation must propagate visibility upward, or absorb drift with very short TTLs.
- Degradation: on cache failure, origin directly and tighten rate limits; no unbounded retries into the database.
When “finance-grade” consistency meets homepage-scale traffic, split paths: critical reads/writes on a short primary path; presentation data on tolerably stale caches.
Observability & alerts
- Metrics: hit rate, miss QPS, origin rate, cache op latency (p95/p99), memory/connections.
- Tracing: tag hit/miss and redacted key prefixes to debug bad invalidation or hotspots.
- Alerts: sudden hit-rate drops, per-key QPS spikes, origin timeouts, cache error rates tied to runbooks.
Key & TTL draft tool
Fill the fields to draft layered cache keys and illustrative TTL bands per freshness tier (illustrative—align with your team before shipping). Jitter uses a uniform spread example.
Segments are slugged to alphanumerics plus _ - .; empty segments become unknown. Add region, locale, etc. in real deployments.
---
name: caching-strategy
description: Design layered caches and invalidation for read/write paths
tags: [caching, redis, performance, consistency]
---
# Layered Strategy
- Browser: Cache-Control private/public + max-age; force private for PII responses
- CDN: public + s-maxage + stale-while-revalidate; authenticated content bypasses CDN
- App process: short TTL LRU (~60s); multi-instance consistency via natural TTL expiry
- Redis: Cache-Aside pattern; TTL = base + random jitter (+-10%); null cache 30s for penetration
# Redis Data Structure Selection
- String: single-object JSON serialization (user info, product details)
- Hash: session fields, config (supports single-field HSET updates)
- List: recent N records (LPUSH + LTRIM to maintain fixed length)
- Set: dedup tags, blocklist (O(1) SISMEMBER)
- Sorted Set: leaderboard, scored priority queue (ZADD + ZRANGEBYSCORE)
# Defense Patterns (code level)
- Breakdown: threading.Lock + Double Check -> only one thread hits origin within process
- Penetration: BF.EXISTS (Redis Bloom Filter) -> return None immediately for definite misses
- Avalanche: TTL = base + random.randint(-base*0.1, base*0.1)
# Write-Path Consistency
- Recommended: update DB first -> delete cache -> delayed delete after 500ms (guard replica lag)
- Stronger: message queue / Binlog CDC drives async delete + eventual consistency fallback
- Forbidden: two services writing the same cache key without coordination