Kubernetes 清单

让 Agent 编写带标签/选择器一致、探针合理、resources 与 securityContext 的工作负载清单，并配合 ConfigMap/Secret 引用约定；本页按「流程 → 工作负载 → 网络 → 探针 → 资源 → 安全 → 策略 → 校验 → 片段实验室」展开。

SKILL 定义命名空间、标准标签（app、version、team）、以及 Deployment 滚动更新策略；Service 类型与 headless 选用场景写清。

liveness/readiness/startup 探针避免与慢启动冲突；readOnlyRootFilesystem、drop capabilities、runAsNonRoot 等安全默认值按集群策略列出。

HPA、PDB、NetworkPolicy 若由同仓库维护，说明引用顺序与最小示例；镜像拉取策略与 imagePullSecrets 的环境差异注明。

水平扩展：副本、反亲和与拓扑分布的择一准则。
存储：PVC 与 StatefulSet 的选用边界。
校验：kubectl apply --dry-run=server 或 kubeconform 在 CI 中的位置。

从仓库草稿到集群可调度（流程）

  [ 清单目录 / Kustomize 或 Helm ]
        │
        ▼
  ┌─────────────┐     一致：labels 与 selector、name 前缀、标准注解
  │  Workload    │──── Deployment / SS / DS：strategy、revisionHistoryLimit
  └─────────────┘
        │
        ▼
  ┌─────────────┐     类型：ClusterIP / LB / headless；端口与 targetPort
  │  Service     │──── 与 Pod 端口名对齐，避免魔法数字散落
  └─────────────┘
        │
        ▼
  ┌─────────────┐     startup → readiness → liveness；resources 必填
  │ Pod 模板     │──── securityContext、probe 超时与阈值可解释
  └─────────────┘
        │
        ▼
  ┌─────────────┐     dry-run=server / kubeconform；命名空间与配额
  │ 校验与合流    │──── 变更说明：镜像 tag、资源变更、探针调整
  └─────────────┘

合流顺序：先保证选择器与 Service 端口一致，再调探针与 resources；慢启动工作负载务必加 startupProbe，避免 liveness 误杀。

标签、Deployment 与滚动更新

metadata.labels 与 spec.selector、template.metadata.labels 必须匹配；版本标签用于观测与回滚，避免仅用 latest。

strategy.type：RollingUpdate 时写明 maxSurge / maxUnavailable 与团队上限。
revisionHistoryLimit 限制旧 ReplicaSet 堆积；金丝雀若由网关/Argo 管，在 SKILL 中写明与原生滚动边界。

完整的 Deployment YAML（含所有推荐字段）：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
  labels:
    app: myapp
    version: "1.2.3"
    team: platform
    environment: production
spec:
  replicas: 3
  revisionHistoryLimit: 5
  selector:
    matchLabels:
      app: myapp          # 不可变，与 template.labels 严格一致
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1         # 更新时最多多出 1 个 Pod
      maxUnavailable: 0   # 更新时始终保持 replicas 个可用 Pod
  template:
    metadata:
      labels:
        app: myapp
        version: "1.2.3"
    spec:
      terminationGracePeriodSeconds: 30
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: myapp
      containers:
        - name: myapp
          image: myregistry/myapp:1.2.3-abc1234
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 3000
              protocol: TCP
          env:
            - name: NODE_ENV
              value: production
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: myapp-secrets
                  key: db-password
          resources:
            requests:
              cpu: 250m
              memory: 256Mi
            limits:
              cpu: "1"
              memory: 1Gi
          securityContext:
            runAsNonRoot: true
            runAsUser: 1001
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
          volumeMounts:
            - name: tmp
              mountPath: /tmp
      volumes:
        - name: tmp
          emptyDir: {}

Service 与 headless

ClusterIP 为默认；对外入口用 LoadBalancer 或 Ingress，由平台文档固定。port / targetPort 优先引用容器端口名。

clusterIP: None 用于 StatefulSet 成员发现或客户端自管负载时；SKILL 写明 DNS 形式与客户端责任。

完整的 Service + Ingress 配置示例：

---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: production
  labels:
    app: myapp
spec:
  type: ClusterIP
  selector:
    app: myapp              # 与 Deployment template.labels 一致
  ports:
    - name: http
      port: 80
      targetPort: http      # 引用容器端口名，不用魔法数字
      protocol: TCP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
    # cert-manager 自动签发 TLS 证书
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - myapp.example.com
      secretName: myapp-tls
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp
                port:
                  name: http

liveness / readiness / startup 探针

readiness 决定是否接流量；liveness 失败会重启容器；startup 仅在启动期生效，成功后由后两者接管。

建议

HTTP 健康路径与业务「能服务」语义一致，避免将重依赖写进 readiness 导致全量摘除。
慢启动：先加 startup，再收紧 liveness 的 initialDelaySeconds。
timeoutSeconds、periodSeconds、failureThreshold 与 SLO 对齐并写注释。

避免

liveness 与 readiness 完全雷同且过严，导致抖动重启或长期 NotReady。
在探针里打下游依赖导致级联超时（除非团队明确要求）。
exec 探针脚本无超时控制或 shell 过重。

requests、limits 与 QoS

requests 供调度与 HPA 参考；limits 限制上限。无 limits 仅 requests 常为 Burstable；requests=limits 且非 BestEffort 为 Guaranteed（按容器集合规则以官方文档为准）。

生产环境避免 BestEffort；为 Java/Go 等运行时留出堆外与线程栈余量。
变更 CPU/memory 时在 PR 说明延迟、GC 与 OOM 风险；大数据集任务考虑单独 workload class。

securityContext 与卷挂载

Pod 与容器级 securityContext 组合使用：runAsNonRoot、readOnlyRootFilesystem、allowPrivilegeEscalation: false、capabilities.drop: ["ALL"] 按需加回白名单。

ConfigMap/Secret：只读挂载路径统一；敏感卷 defaultMode 与 RBAC 谁可 apply 写清。
imagePullPolicy 与 imagePullSecrets 按环境（CI、预发、生产）区分。

HPA、PDB、NetworkPolicy

HPA 指标来源（CPU、自定义指标、KEDA）与最小副本写进 SKILL；PDB 的 minAvailable / maxUnavailable 与发布策略一起评审。

NetworkPolicy：默认拒绝还是默认放行由平台决定，Agent 生成策略时需显式列出 namespace/label selector。

HPA 配置示例（基于 CPU 和自定义指标）+ PDB：

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70    # CPU 使用率目标
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second   # 自定义指标（需 custom.metrics.k8s.io）
        target:
          type: AverageValue
          averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 缩容稳定窗口：防止频繁抖动
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Pods
          value: 4
          periodSeconds: 30

---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp
  namespace: production
spec:
  minAvailable: 2     # 或 maxUnavailable: 1，与副本数和滚动更新策略联动
  selector:
    matchLabels:
      app: myapp

---
# securityContext 完整配置示例
# 在 Deployment spec.template.spec 中：
# securityContext（Pod 级）
#   runAsNonRoot: true
#   runAsUser: 1001
#   runAsGroup: 1001
#   fsGroup: 1001
#   seccompProfile:
#     type: RuntimeDefault
# securityContext（容器级）
#   allowPrivilegeEscalation: false
#   readOnlyRootFilesystem: true
#   capabilities:
#     drop: ["ALL"]
#     add: ["NET_BIND_SERVICE"]    # 按需加回，仅监听 1024 以下端口时

校验与 CI

kubectl apply --dry-run=server 或 kubectl diff；与 kubeconform/kubeval 的版本集一致。
策略准入（OPA Gatekeeper、Kyverno）失败信息应可映射回清单字段名。

探针与 resources 片段实验室

根据端口、探针类型与是否启用 startup/readiness 生成一段可粘贴到 containers[] 的 YAML（含 ports、探针与可选 resources）；合并前请按实际健康检查与容量再调参。

探针

containerPort

httpGet path（HTTP 时）

httpGet tcpSocket

startupProbe（慢启动） readinessProbe

resources

预设

含 startupProbe 时，liveness/readiness 的 initialDelaySeconds 设为 0 的常见写法已体现在生成结果中；请按运行时真实启动时间调整 failureThreshold 与 periodSeconds。

延伸阅读

Assigning CPU and memory resources（Kubernetes 文档） — requests、limits 与调度行为
Container probes（Kubernetes 文档） — 三类探针与字段说明
Pod Security Standards（Kubernetes 文档） — 与 securityContext 对齐的基线

---
name: kubernetes-manifests
description: 编写可上线的 K8s Deployment/Service 等清单
tags: [kubernetes, k8s, devops, deployment]
---
# 工作负载
1. labels 与 selector 严格一致：metadata.labels / spec.selector / template.labels
2. 滚动更新：maxSurge=1, maxUnavailable=0 保持服务连续
3. revisionHistoryLimit: 5 限制旧 ReplicaSet 堆积
4. topologySpreadConstraints 跨可用区分散 Pod

# 网络
5. Service targetPort 引用容器端口名（不用魔法数字）
6. Ingress 配 TLS + cert-manager 自动签发，ssl-redirect: true
7. ClusterIP 为默认；LoadBalancer 由平台文档固定；headless 写明 DNS 形式

# 探针
8. startupProbe 保护慢启动容器（failureThreshold * periodSeconds >= 最长启动时间）
9. readinessProbe 决定接流量；liveness 失败才重启；两者不完全雷同
10. 探针 HTTP 路径与业务就绪语义一致，不在 readiness 打重依赖

# 资源与安全
11. requests + limits 必填；Guaranteed QoS（requests=limits）适合生产关键服务
12. securityContext：runAsNonRoot, allowPrivilegeEscalation:false, readOnlyRootFilesystem
13. capabilities.drop: ["ALL"]，按需加回最小能力
14. emptyDir 挂载 /tmp 解决 readOnlyRootFilesystem 下的临时文件需求

# 弹性与校验
15. HPA minReplicas/maxReplicas + scaleDown stabilizationWindow 防抖
16. PDB minAvailable 与发布策略联动，确保节点维护时服务不中断
17. 校验：kubectl apply --dry-run=server + kubeconform 在 CI 中固定版本

返回技能库更多技能入口