Performance profiling
Under reproducible load, capture flame graphs and allocation traces; separate profiler overhead from real hotspots and follow with targeted micro-benchmarks.
The SKILL names the profiler (async-profiler, pprof, Chrome Performance, etc.), sampling duration, and production safety switches; avoid rewriting hot paths before measuring.
For GC, lock, and I/O waits, cross-check metrics (latency percentiles, saturation) with stacks so you don’t mistake symptoms for root causes.
Hypothesis →repro →sample →flame graph →verify
[ Performance hypothesis / user-visible symptom ]
│
▼
┌─────────────┐ Pin: version, data size, hardware, concurrency, cache state
│ Repro load │──── Cold vs steady-state runs—don’t mix JIT warmup conclusions
└─────────────┘
│
▼
┌─────────────┐ Periodic stack samples (CPU) or events (alloc, lock, I/O)
│ Profiler │──── Long enough for multiple requests/iterations; throttle in prod
└─────────────┘
│
▼
┌─────────────┐ Fold identical stacks; width = share of samples on that path
│ Flame graph │──── Self time vs children; watch inlining / missing native frames
└─────────────┘
│
▼
┌─────────────┐ Micro-benchmark or A/B; keep numbers + regression tests
│ Verify │──── Document why this path is hot and the rollback plan
└─────────────┘
Sampling is statistical—too few samples makes bars noisy; before raising frequency, assess impact on production traffic.
Sample rate, duration, and overhead
CPU sampling: periodic stack grabs (e.g. every 10–100ms); denser intervals stabilize stats but increase interrupt/buffer cost. Prefer a long enough window with moderate frequency over ultra-short high-frequency bursts.
Alloc / locks / wall-clock: often event- or timer-driven; document alloc tracing, user/kernel frames, and whether CI uses smaller datasets for smoke profiles.
- Separate cold-start vs steady-state captures to avoid JIT confusion.
- Label dataset size and hardware for large inputs.
- Redact profiles before sharing or archiving.
Node.js CPU profiling: capture V8 tick files with --prof and generate flame graphs:
# 1. Start Node.js with --prof (generates isolate-*.log)
node --prof app.js &
APP_PID=$!
# Apply load (use autocannon / wrk or similar)
npx autocannon -c 50 -d 30 http://localhost:3000/api/heavy
kill $APP_PID
# 2. Convert isolate-*.log to human-readable tick file
node --prof-process isolate-*.log > processed.txt
# 3. Use 0x to generate interactive flame graph HTML in one step
npx 0x -- node app.js
# Automatically generates flame graph in ./{pid}.0x/; open in browser to interact
# 4. Use clinic.js suite (friendlier diagnostic tooling)
npm install -g clinic
clinic doctor -- node app.js # comprehensive diagnosis
clinic flame -- node app.js # CPU flame graph
clinic bubbleprof -- node app.js # async call bubble chart
# 5. Micro-benchmark a specific code path (benchmark.js)
const Benchmark = require('benchmark');
const suite = new Benchmark.Suite();
suite
.add('method A', () => { slowMethod(); })
.add('method B', () => { fastMethod(); })
.on('cycle', (e) => console.log(String(e.target)))
.on('complete', function() {
console.log('Fastest: ' + this.filter('fastest').map('name'));
})
.run({ async: true });
Reading flame graphs
Classic CPU flame graphs: horizontal width shows relative share of samples on that path (not chronological order); vertical depth is call depth, usually entry →hot leaves.
Look here first
- Widest “plateaus”: among sibling callees, width shows the dominant contributor.
- Sudden widening vs baseline: pin regressions to the new path.
- Deep recursion or repeating motifs: N+1 queries, blocking, or leaky abstractions.
Common misreads
- “On stack” ≠ high self time—use folded views or self metrics.
- Inlining and tail calls: symbols may appear under caller names.
- Heavy GC with CPU-only views—add allocation or heap profiles.
Cross-checking metrics and stacks
After optimizations, keep before/after numbers and regression tests; use feature flags when rolling out; document why the hotspot mattered for future maintainers.
- Latency percentiles vs widest flame paths—same subsystem?
- Saturation (CPU, disk, network) vs off-CPU / wall profiles—consistent story?
- Lock waits: thread dumps + lock profiles aligned with traces in time.
Web Vitals measurement (LCP/FID/CLS) and Lighthouse CI integration:
// Web Vitals measurement (web-vitals v3)
import { onLCP, onFID, onCLS, onINP, onFCP, onTTFB } from 'web-vitals';
function sendToAnalytics(metric) {
const body = JSON.stringify({
name: metric.name,
value: metric.value, // raw value (ms or score)
rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'
delta: metric.delta,
id: metric.id,
navigationType: metric.navigationType,
});
navigator.sendBeacon('/analytics', body);
}
onLCP(sendToAnalytics); // Largest Contentful Paint: target < 2500ms
onFID(sendToAnalytics); // First Input Delay: target < 100ms (deprecated, prefer INP)
onINP(sendToAnalytics); // Interaction to Next Paint: target < 200ms
onCLS(sendToAnalytics); // Cumulative Layout Shift: target < 0.1
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);
Lighthouse CI configuration and GitHub Actions integration:
// lighthouserc.js — Lighthouse CI config
module.exports = {
ci: {
collect: {
url: ['http://localhost:3000/', 'http://localhost:3000/about'],
numberOfRuns: 3, // multiple runs for median
startServerCommand: 'npm run start',
startServerReadyPattern: 'listening on',
},
assert: {
preset: 'lighthouse:recommended',
assertions: {
'categories:performance': ['error', { minScore: 0.9 }],
'categories:accessibility': ['warn', { minScore: 0.95 }],
'largest-contentful-paint': ['error', { maxNumericValue: 2500 }],
'cumulative-layout-shift': ['error', { maxNumericValue: 0.1 }],
'interactive': ['warn', { maxNumericValue: 3800 }],
},
},
upload: {
target: 'temporary-public-storage', // or configure LHCI server
},
},
};
// Integrate Lighthouse CI in GitHub Actions
// - name: Run Lighthouse CI
// run: |
// npm install -g @lhci/cli
// lhci autorun
// env:
// LHCI_GITHUB_APP_TOKEN: ${{ secrets.LHCI_GITHUB_APP_TOKEN }}
Tooling cheat sheet
- JVM: async-profiler (CPU/alloc), JFR; outputs as collapsed stacks or JFR convertible to flame HTML.
- Go / native: pprof (
cpu,heap,mutex);go tool pprof -http=for interactive flames. - Front end: Chrome Performance, Lighthouse—long tasks, layout, script self time.
Profiling session draft
Fill the fields to paste a “sampling + flame graph” checklist into tickets or SKILL appendices (illustrative—align with your runbooks).
Prefer your team’s collapsed stack format for exports; when comparing runs, pin binaries and input datasets so bar width changes aren’t environmental drift.
---
name: performance-profiling
description: Use profilers to find CPU/memory hotspots with verifiable follow-ups
tags: [performance, profiling, web-vitals, lighthouse]
---
# Profiling Methodology
- Start with a hypothesis: user-visible symptom + concrete path (URL/endpoint/code path)
- Reproducible load: pin version, data size, hardware, concurrency, cache warm/cold state
- Tool selection: Node.js uses --prof / 0x / clinic; frontend uses Chrome DevTools + Lighthouse
# Node.js CPU Profiling
- --prof + --prof-process generates tick file; 0x generates flame graph HTML in one step
- clinic doctor for comprehensive diagnosis; clinic flame for CPU; clinic bubbleprof for async
- Micro-benchmarks with benchmark.js; keep before/after numbers and regression test cases
# Frontend Performance
- LCP target < 2500ms; INP target < 200ms; CLS target < 0.1
- web-vitals library collects real-user data in production and reports to analytics
- Lighthouse CI: lighthouserc.js configures assertions; blocks non-compliant PRs in CI
# Flame Graph Interpretation
- Wide bars = that path accounts for a large share of samples (not absolute time)
- Look first at the widest "plateaus", then drill down to leaf functions
- Overlay CPU and allocation graphs: distinguish CPU hotspots from GC/memory pressure
# Regression Prevention
- Performance baselines in CI: run benchmarks and compare against main branch
- Lighthouse CI minScore/maxNumericValue gates block merges on regression
- Feature flags to control rollout of optimizations; maintain rollback capability
- Document hotspot causes and optimization rationale for future maintainers