API 限流

在网关或应用层按用户、租户、IP 或 API Key 限流；选择算法与存储（本地、Redis），并规范 429 响应与 Retry-After。

本页为 Agent 提供 API 限流的完整实施参考：令牌桶算法的 Redis 分布式实现、滑动窗口算法、不同端点的差异化配置（登录 vs 查询）、标准响应头（X-RateLimit-*），以及客户端指数退避代码。

SKILL 应说明令牌桶（允许突发）、漏桶（平滑输出）与滑动窗口的优缺点；分布式场景注意 Redis Lua 脚本原子性，避免 INCR + EXPIRE 的竞态问题。

客户端文档中写明指数退避、抖动与幂等键；与熔断、降级策略分层，限流保护资源，熔断隔离失败。

限流决策主流程（skill-flow-block）

  [ 请求进入：解析主体键 user / tenant / ip / api_key ]
                    │
                    ▼
         [ 选存储：进程内计数器 / Redis / 网关插件 ]
                    │
                    ▼
    [ 算法：令牌桶（可突发） / 漏桶（平滑） / 滑动或固定窗口 ]
                    │
           ┌────────┴────────┐
           ▼                 ▼
      [ 允许：扣减配额 ]   [ 拒绝：429 Too Many Requests ]
           │                 │
           │                 ├── Retry-After: 秒 或 HTTP-date
           │                 ├── RateLimit-* / X-RateLimit-*（按团队规范）
           │                 └── body：可机器读的 code + 人类可读说明
           ▼
    [ 记录指标：拒绝率、热点键、配额余量采样 ]

Agent 实现或评审时，把「键如何算」「拒绝时返回什么」写进同一处规范，避免网关与应用层双重计数口径不一致。

令牌桶与滑动窗口的 Redis 实现

令牌桶算法（Redis Lua 脚本，原子操作）：

// ratelimit/tokenBucket.ts
import { Redis } from 'ioredis'

const TOKEN_BUCKET_LUA = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])    -- 桶容量（最大突发）
local rate = tonumber(ARGV[2])        -- 每秒补充速率
local now = tonumber(ARGV[3])         -- 当前时间戳（ms）
local cost = tonumber(ARGV[4])        -- 本次请求消耗令牌数

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last_refill = tonumber(data[2]) or now

-- 计算补充令牌数
local elapsed = (now - last_refill) / 1000
local refill = elapsed * rate
tokens = math.min(capacity, tokens + refill)

if tokens < cost then
  -- 不够令牌，拒绝
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)
  return {0, math.ceil((cost - tokens) / rate * 1000)}
end

tokens = tokens - cost
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)
return {1, tokens}
`

export async function tokenBucketAllow(
  redis: Redis,
  key: string,
  options: { capacity: number; rate: number; cost?: number }
): Promise<{ allowed: boolean; remaining: number; retryAfterMs: number }> {
  const [allowed, value] = await redis.eval(
    TOKEN_BUCKET_LUA, 1, key,
    options.capacity, options.rate, Date.now(), options.cost ?? 1
  ) as [number, number]
  return {
    allowed: allowed === 1,
    remaining: allowed === 1 ? value : 0,
    retryAfterMs: allowed === 0 ? value : 0,
  }
}

滑动窗口算法（Redis Sorted Set 实现）：

// ratelimit/slidingWindow.ts
export async function slidingWindowAllow(
  redis: Redis,
  key: string,
  limit: number,       // 窗口内最大请求数
  windowMs: number     // 窗口大小（毫秒）
): Promise<{ allowed: boolean; remaining: number }> {
  const now = Date.now()
  const windowStart = now - windowMs

  const pipeline = redis.pipeline()
  // 移除窗口外的旧记录
  pipeline.zremrangebyscore(key, '-inf', windowStart)
  // 统计当前窗口内请求数
  pipeline.zcard(key)
  // 添加当前请求（score = 时间戳，member = 唯一 ID）
  pipeline.zadd(key, now, `${now}-${Math.random()}`)
  // 设置 key 过期时间（自动清理）
  pipeline.pexpire(key, windowMs)

  const results = await pipeline.exec()
  const count = (results![1][1] as number)

  if (count >= limit) {
    // 超限，移除刚添加的记录
    await redis.zpopmax(key)
    return { allowed: false, remaining: 0 }
  }
  return { allowed: true, remaining: limit - count - 1 }
}

不同端点的差异化限流配置：

// middleware/rateLimit.ts
import { Request, Response, NextFunction } from 'express'

const RATE_LIMIT_CONFIGS = {
  // 登录接口：严格限流，防止暴力破解
  login: { capacity: 5, rate: 1/60, windowMs: 15 * 60_000 },
  // 查询接口：宽松，允许突发
  query: { capacity: 100, rate: 10, windowMs: 60_000 },
  // 写操作：中等
  write: { capacity: 30, rate: 5, windowMs: 60_000 },
  // 文件上传：严格
  upload: { capacity: 10, rate: 1/60, windowMs: 60_000 },
}

export function rateLimit(type: keyof typeof RATE_LIMIT_CONFIGS) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const config = RATE_LIMIT_CONFIGS[type]
    const key = `rl:${type}:${req.user?.id ?? req.ip}`
    const result = await tokenBucketAllow(redis, key, config)

    // 标准响应头
    res.set({
      'X-RateLimit-Limit': String(config.capacity),
      'X-RateLimit-Remaining': String(result.remaining),
      'X-RateLimit-Reset': String(Math.ceil((Date.now() + result.retryAfterMs) / 1000)),
    })

    if (!result.allowed) {
      res.set('Retry-After', String(Math.ceil(result.retryAfterMs / 1000)))
      return res.status(429).json({
        type: 'https://api.example.com/problems/rate-limited',
        title: 'Too Many Requests',
        status: 429,
        retryAfter: Math.ceil(result.retryAfterMs / 1000),
      })
    }
    next()
  }
}

限流响应头标准与客户端退避

标准响应头（X-RateLimit-* 系列）：

# 429 响应示例（含完整限流头）
HTTP/1.1 429 Too Many Requests
Content-Type: application/problem+json
X-RateLimit-Limit: 100        # 窗口内配额上限
X-RateLimit-Remaining: 0      # 当前剩余次数
X-RateLimit-Reset: 1744380060 # Unix 时间戳（窗口重置时间）
Retry-After: 47               # 建议等待秒数（RFC 7231）

{
  "type": "https://api.example.com/problems/rate-limited",
  "title": "Too Many Requests",
  "status": 429,
  "detail": "API rate limit exceeded. Limit: 100 requests per minute.",
  "retryAfter": 47
}

客户端指数退避代码（含抖动）：

// utils/fetchWithRetry.ts
interface RetryOptions {
  maxRetries?: number
  baseDelayMs?: number
  maxDelayMs?: number
  jitter?: boolean
}

async function fetchWithRetry(
  url: string,
  init?: RequestInit,
  options: RetryOptions = {}
): Promise<Response> {
  const { maxRetries = 3, baseDelayMs = 1000, maxDelayMs = 30000, jitter = true } = options

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    const response = await fetch(url, init)

    if (response.status !== 429) return response
    if (attempt === maxRetries) return response

    // 优先使用服务端 Retry-After 头
    const retryAfter = response.headers.get('Retry-After')
    let delayMs: number

    if (retryAfter && /^\d+$/.test(retryAfter)) {
      delayMs = parseInt(retryAfter, 10) * 1000
    } else {
      // 指数退避：2^attempt * baseDelay
      delayMs = Math.min(Math.pow(2, attempt) * baseDelayMs, maxDelayMs)
    }

    // 加入随机抖动（±25%），避免同步重试风暴
    if (jitter) {
      delayMs = delayMs * (0.75 + Math.random() * 0.5)
    }

    console.warn(`Rate limited. Retrying in ${Math.round(delayMs)}ms (attempt ${attempt + 1}/${maxRetries})`)
    await new Promise(resolve => setTimeout(resolve, delayMs))
  }
  throw new Error('Max retries exceeded')
}

HTTP 429 与 503（过载/维护）语义区分：429 客户端策略问题，503 服务暂不可用。
代理与 CDN 可能剥离 Retry-After，需在集成测试中端到端验证。
OpenAPI 中标注配额与 429 响应体 schema，与「API 契约」技能对齐。

公平性与分布式注意点

共享 NAT / 企业出口：纯 IP 限流易误伤，应结合 Cookie、JWT、API Key 或登录态。
付费与免费档：分层配额、白名单运维接口需审计，防止内部接口成为绕过通道。
客户端：指数退避 + 全局限流器 + 抖动，配合幂等键；禁止无上限重试风暴。

429 Retry-After 提示生成器

粘贴服务端返回的 Retry-After 字段值（不含头名），生成本地可读的重试时间与可写入 SKILL/客户端注释的短提示。解析仅在浏览器内完成。

Retry-After 值

整数按 RFC 7231 延迟秒解析；否则按 HTTP-date 解析。时区以浏览器本地显示为准。

---
name: api-rate-limiting
description: 限流维度、算法与 429 语义
---
# 规则
- 限流键按认证用户 ID（优先）或 IP，避免共享 NAT 误伤
- 算法：令牌桶（允许突发）用 Redis Lua 原子脚本；滑动窗口用 ZSET
- 登录/注册接口严格限流（5次/15min）；查询接口宽松（100次/min）
- 响应头：X-RateLimit-Limit / Remaining / Reset + Retry-After（秒）
- 429 响应体用 RFC 7807 problem+json 格式

# 步骤
1. 确认限流维度（user_id / api_key / ip）与各端点配额策略
2. 实现中间件：调用 tokenBucketAllow() 或 slidingWindowAllow()
3. 挂载顺序：在 body 解析与鉴权之前（节省资源）
4. 设置标准响应头：X-RateLimit-* 和 Retry-After
5. 客户端文档：指数退避（2^n * base）+ 抖动 + 最大重试次数
6. 监控：拒绝率、热点键、各端点 P99 错误比例
7. 压测验证：限流策略在 10x 流量下行为符合预期

返回技能库更多技能入口