Memory leak debugging

Separate true leaks from cache bloat, pools not returned, and listeners never removed. Clarify heap vs GC boundaries first, then compare snapshots and profiles to find growing types and write collaboration-ready reference memos.

The SKILL guides the agent to confirm symptoms first: RSS vs off-heap, container cgroup limits, native or metaspace growth. Then pick tools (heap dump, allocation profiling, async profiler).

Symptoms and distinctions

A monotonic resident-memory curve is not always a leak: it can be a large but bounded cache, pre-allocated pools, or the heap not having collected yet. Split heap object growth from off-heap / stacks / metaspace, and rule out “legitimate retention from higher traffic.”

  • Avoid full production dumps of huge heaps; prefer sampling, short windows, or isolated replicas.
  • Distinguish from “congestion backpressure”: unbounded queue buildup can look like a leak.
  • Output should name suspicious types, holder summaries, and suggested change locations.

Three common high-frequency leak patterns (Node.js / browser):

// ❌ Pattern 1: closure captures large object
function makeHandler(bigData) {
  // bigData (possibly MB) is held by the closure; survives as long as handler does
  return (event) => console.log(bigData.payload, event.type);
}
// ✅ Fix: keep only what you need
function makeHandler(bigData) {
  const id = bigData.id;
  return (event) => console.log(id, event.type);
}

// ❌ Pattern 2: event listener never removed (browser / Node EventEmitter)
class Widget {
  mount() {
    this._handler = this.onResize.bind(this);
    window.addEventListener("resize", this._handler);
  }
  // Forgot destroy → _handler holds this → Widget cannot be GC'd
  destroy() {
    window.removeEventListener("resize", this._handler); // ✅ required
  }
}

// ❌ Pattern 3: global Map/object grows without bound
const reqCache = new Map();
app.use((req, res, next) => {
  reqCache.set(req.id, req.body); // never deleted; memory monotonically increases
  next();
});
// ✅ Fix: use LRU + TTL
const { LRUCache } = require("lru-cache");
const reqCache = new LRUCache({ max: 1000, ttl: 5 * 60 * 1000 });

Heap and GC

JVM: watch old gen / G1 Humongous, Metaspace (classloader leaks), direct memory (NIO, Netty). If the heap does not drop after full GC, it is more like a true leak or permanent live set. Objects reachable only via WeakReference chains can be collected; if incoming paths are only weak, that is usually not the leak root.

Node / V8: young-gen Scavenge vs old-gen Mark-Compact differ in rhythm; Buffer and TypedArray may be off-heap. Long-lived Maps, closures capturing request context, and missing removeListener are frequent sources.

Browser: detached DOM, uncleared requestAnimationFrame, listeners on global singletons show as retained subtrees in Performance → Memory heap snapshots.

Node.js memory metrics collection and heapdump trigger:

// 1. Print memory metrics in real time (RSS / heapUsed / heapTotal / external)
function logMemory(label) {
  const m = process.memoryUsage();
  console.log(`[${label}]`, {
    rss:       (m.rss       / 1024 / 1024).toFixed(1) + " MB",  // process total RSS
    heapTotal: (m.heapTotal / 1024 / 1024).toFixed(1) + " MB",  // V8 allocated heap
    heapUsed:  (m.heapUsed  / 1024 / 1024).toFixed(1) + " MB",  // objects in use
    external:  (m.external  / 1024 / 1024).toFixed(1) + " MB",  // Buffer/TypedArray
  });
}
setInterval(() => logMemory("tick"), 10_000);

// 2. --expose-gc + manually trigger GC and compare
// Start: node --expose-gc server.js
if (global.gc) {
  logMemory("before-gc");
  global.gc();
  logMemory("after-gc"); // if heapUsed is still high, suspect a leak
}

// 3. Generate heap snapshot (v8 built-in, Node 11.13+)
const v8 = require("v8");
const path = require("path");
app.get("/debug/heap", (req, res) => {
  const file = v8.writeHeapSnapshot(
    path.join("/tmp", `heap-${Date.now()}.heapsnapshot`)
  );
  res.json({ file });  // download and analyze in Chrome DevTools
});

// 4. Alert thresholds (adjust to actual RSS limit)
// heapUsed > heapTotal * 0.85 → alert
// RSS > container_limit * 0.80  → alert
// GC CPU > 20%                  → alert (performance issue)

Profiling and snapshots

Heap snapshot diff: capture before and after the same business steps; cluster by class or package and inspect delta and retained size; expand incoming references on the largest growth to the path from GC roots—“who still holds this?”

Allocation profiling: allocation profiling / async-profiler alloc mode points at “who allocated many short-lived objects quickly”; complements leaks (long-lived retention)—sometimes you must kill allocation hotspots before the curve stabilizes.

OS level: combine pmap, container memory metrics, and native tracing to see if the issue is off-heap or JNI.

Chrome DevTools three-snapshot method (browser / Node Inspector):

// === Chrome DevTools three-snapshot steps ===
// 1. Open DevTools → Memory → Heap Snapshot
// 2. Take snapshot S1 (baseline)
// 3. Perform suspicious operation (e.g. navigate to page → back, repeat 5x)
// 4. Take snapshot S2
// 5. Repeat operation again (5 more times)
// 6. Take snapshot S3
// 7. Select S3 → switch top dropdown to "Comparison" mode, base = S1
// 8. Sort by #Delta descending → expand the class with largest growth
// 9. Select instance → bottom "Retainers" panel → trace to GC Root

// Node.js: use --inspect to open Chrome remote debugging:
// node --inspect server.js
// Browser: chrome://inspect → "Open dedicated DevTools for Node"

// === Quick CLI diff (heapdump) ===
// npm install -g heapdump
const heapdump = require("heapdump");
// Signal-triggered snapshot (recommended in production)
process.on("SIGUSR2", () => {
  heapdump.writeSnapshot("/tmp/heap-" + Date.now() + ".heapsnapshot",
    (err, file) => console.log("snapshot:", file));
});
// Trigger: kill -SIGUSR2 

// === Key terms in the Retainers panel ===
// system / Context  → closure scope
// (global)          → global variable
// EventListener     → unreleased listener
// Detached ...      → removed from DOM but still held by JS

Debug flow

  [ Confirm: RSS / heap / off-heap / metaspace / cgroup limit ]
                    │
                    ▼
         [ Distinguish: true leak vs cache·pool·legitimate traffic growth ]
                    │
                    ▼
              [ Two snapshots or time-series trend ]
                    │
                    ▼
         [ Sort by retained / delta → expand incoming ]
                    │
           ┌────────┴────────┐
           ▼                 ▼
    [ Trace to root holder ]   [ Allocation profile finds allocators ]
           │                 │
           └────────┬────────┘
                    ▼
         [ Fix: dispose / weak refs / TTL cache / backpressure ]
                    │
                    ▼
            [ Load or soak → flat curve + alerts ]

Working with agents

Context for the agent should include runtime (JVM / Node version), sampling window, and concrete class names and retained numbers from two snapshots or metric screenshots. Ask it to restate the reference chain before patching—avoid evidence-free “maybe it’s a cache.”

Post-fix validation: time-bounded soak, compare instance counts before/after; add heap usage and GC frequency alerts, not only RSS.

Memory metrics monitoring and alerting example (Prometheus + Node.js):

// Prometheus metrics (prom-client)
const client = require("prom-client");
const heapUsedGauge = new client.Gauge({
  name: "nodejs_heap_used_bytes",
  help: "V8 heap used",
});
const rssGauge = new client.Gauge({
  name: "nodejs_rss_bytes",
  help: "Resident Set Size",
});
setInterval(() => {
  const m = process.memoryUsage();
  heapUsedGauge.set(m.heapUsed);
  rssGauge.set(m.rss);
}, 5000);

// Prometheus alerting rule
// groups:
//   - name: nodejs-memory
//     rules:
//       - alert: HeapUsageHigh
//         expr: nodejs_heap_used_bytes / nodejs_heap_size_total_bytes > 0.85
//         for: 5m
//         annotations:
//           summary: "Heap usage > 85% for 5 min"
//       - alert: RSSGrowthSteady
//         expr: increase(nodejs_rss_bytes[30m]) > 100 * 1024 * 1024
//         annotations:
//           summary: "RSS grew > 100 MB in 30 min — possible leak"

// Metric definitions:
// rss          = process total physical memory (heap + stack + code + off-heap Buffer)
// heapTotal    = V8 heap allocated from OS (includes free pages)
// heapUsed     = V8 objects currently allocated (key metric to watch)
// external     = off-heap memory for Buffer / TypedArray at C++ level
// arrayBuffers = ArrayBuffer / SharedArrayBuffer size (included in external)

Reference-chain memo builder

After copying key nodes from MAT, Chrome Heap Snapshot, YourKit, etc., fill the fields below to produce a pasteable memo; data stays in this page’s browser (localStorage), not uploaded.


            
code-preview">--- name: memory-leak description: Memory leak investigation: heap snapshots, reference chains, and monitoring model: claude-sonnet-4-5 --- # Prerequisites - Runtime version (Node.js / JVM) and container cgroup memory limit - Confirm it is heap growth, not off-heap Buffer, metaspace, or JNI - Rule out "legitimate retention from increased traffic" # Tool selection - Node.js: node --expose-gc + v8.writeHeapSnapshot() or heapdump - Browser: Chrome DevTools → Memory → Heap Snapshot - JVM: jmap -dump / async-profiler -e alloc / MAT # Three-snapshot method steps 1. Snapshot S1 (baseline) 2. Perform suspicious business operation x5 3. Snapshot S2 4. Repeat operation x5 5. Snapshot S3 → Comparison against S1 6. Sort by #Delta descending → expand largest growth → trace Retainers to GC Root # Common root cause patterns - Closure captures large object (keep only necessary fields) - Event listener missing removeListener (clean up on component unmount) - Global Map/object grows without bound (replace with LRU + TTL) - setInterval / RAF not cancelled # Alert thresholds - heapUsed / heapTotal > 85% for 5min → alert - RSS grows > 100 MB in 30min → alert - GC CPU > 20% → performance alert # Fix validation - Soak test 30min; compare instance counts before/after fix - Confirm curve flattens in Prometheus/Grafana

Back to skills More skills