Phần 14: HPA, PDB và an toàn rollout

HPA, Horizontal Pod Autoscaler

Cơ chế

HPA watch metric → so sánh với target → scale replicas Deployment/StatefulSet.


  flowchart TB
    MS[metrics-server<br/>cung cấp metric] --> HPA[HPA<br/>điều chỉnh replicas theo target]
    HPA --> Deployment

HPA spec

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-api
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70 # Scale khi CPU > 70% of requests
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300 # Chờ 5 phút trước khi scale down
      policies:
        - type: Percent
          value: 10 # Giảm tối đa 10% mỗi lần
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0 # Scale up ngay
      policies:
        - type: Percent
          value: 100 # Tăng gấp đôi nếu cần
          periodSeconds: 60

HPA cần requests

Quan trọng: averageUtilization: 70 nghĩa là 70% of requests, không phải 70% node capacity. Request sai → HPA scale sai:

Request quá thấp → util% luôn cao → scale up mãi.
Request quá cao → util% luôn thấp → không bao giờ scale.

Custom metrics

metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 1000
  - type: External
    external:
      metric:
        name: sqs_queue_length
        selector:
          matchLabels:
            queue: orders
      target:
        type: AverageValue
        averageValue: 30

Cần Prometheus Adapter hoặc KEDA để expose custom metrics cho HPA.

KEDA, event-driven autoscaling

KEDA (Kubernetes Event-Driven Autoscaling) mở rộng HPA:

Scale từ 0 → N (HPA native không scale tới 0).
60+ scalers: Kafka lag, SQS queue, Prometheus query, cron schedule.
Dùng cho: worker/consumer scale theo queue depth.

PDB, PodDisruptionBudget

Vấn đề

Khi drain node (upgrade OS, scale down), kubelet evict pod. Nếu evict quá nhiều pod cùng lúc → service mất quorum → downtime.

PDB spec

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-api-pdb
spec:
  selector:
    matchLabels:
      app: my-api
  minAvailable: 2 # Luôn giữ ít nhất 2 pod running
  # Hoặc:
  # maxUnavailable: 1              # Tối đa 1 pod unavailable cùng lúc

minAvailable vs maxUnavailable

	`minAvailable: 2`	`maxUnavailable: 1`
3 replicas	Chỉ drain 1 tại một thời điểm	Chỉ drain 1 tại một thời điểm
5 replicas	Drain 3 cùng lúc OK	Chỉ drain 1
HPA scale down	Có thể block	Linh hoạt hơn

Recommendation: dùng maxUnavailable cho workload scale động (HPA), minAvailable khi cần absolute minimum (quorum-based: etcd 3 node cần ≥ 2).

PDB chỉ bảo vệ voluntary disruption

PDB không chặn:

Node crash (involuntary).
OOM kill.
Container crash.

PDB chặn:

kubectl drain (node upgrade, scale down).
Cluster autoscaler scale down node.
Spot instance termination (nếu dùng graceful).

PDB block drain

Nếu PDB không cho phép evict thêm pod → kubectl drain treo, chờ đến khi pod khác sẵn sàng.

# Kiểm tra PDB status
kubectl get pdb
# NAME         MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# my-api-pdb   2               N/A               1                     5m

# ALLOWED DISRUPTIONS = 0 → drain bị block

Kết hợp HPA + PDB + Rolling Update


  flowchart TB
    HPA[HPA<br/>scale 2–20 theo CPU] --> Deployment[Deployment<br/>RollingUpdate: maxUnavailable 0<br/>maxSurge 25%]
    Deployment --> PDB[PDB<br/>maxUnavailable: 1]
    PDB --> Drain[kubectl drain<br/>evict tối đa 1 pod mỗi lần]

Checklist deploy an toàn

Deployment: maxUnavailable: 0, maxSurge: 25% (hoặc 1).
Readiness probe: app phải pass trước khi nhận traffic.
PDB: maxUnavailable: 1 (hoặc minAvailable: N-1).
HPA: minReplicas ≥ 2 (tránh SPOF).
preStop: sleep 5 cho kube-proxy drain.
terminationGracePeriodSeconds: đủ cho app cleanup.

Pitfall: HPA + manual scale

# ❌ SAI: manual scale rồi HPA override
kubectl scale deploy my-api --replicas=10
# HPA sẽ scale lại về target utilization → replicas có thể giảm

# ✅ ĐÚNG: sửa HPA minReplicas nếu cần floor cao hơn
kubectl patch hpa my-api-hpa -p '{"spec":{"minReplicas":10}}'

Node drain và upgrade flow

# 1. Cordon node (ngưng schedule pod mới)
kubectl cordon node-1

# 2. Drain (evict pod, respect PDB)
kubectl drain node-1 --ignore-daemonsets --delete-emptydir-data

# 3. Upgrade node (OS patch, kubelet upgrade...)

# 4. Uncordon
kubectl uncordon node-1

--ignore-daemonsets: DaemonSet pod không evict (sẽ tự dừng khi node shutdown).

Điều cần giữ khi vận hành Kubernetes

HPA: scale replica theo metric (CPU, memory, custom). Base trên requests, requests sai = HPA sai.
PDB: bảo vệ quorum khi drain/upgrade. Chỉ voluntary disruption.
Kết hợp: HPA minReplicas ≥ 2 + PDB maxUnavailable 1 + RollingUpdate maxUnavailable 0 = deploy an toàn.
KEDA cho scale từ 0 và event-driven (queue, cron).

Câu hỏi hay gặp

HPA scale up nhanh nhưng scale down rất chậm, có bình thường không?

Trả lời: Có, by design. stabilizationWindowSeconds (mặc định 300s cho scale down) ngăn flapping: load spike ngắn → scale up → load giảm → không scale down ngay (tránh scale up lại). Tuỳ chỉnh behavior.scaleDown nếu cần nhanh hơn.

PDB `minAvailable: 100%` có được không?

Trả lời: Về syntax được, nhưng block mọi voluntary disruption, kubectl drain treo mãi, cluster autoscaler không scale down node có pod này. Dùng cho service tuyệt đối không được gián đoạn, nhưng hầu hết service nên chấp nhận maxUnavailable: 1.

HPA và VPA có chạy cùng lúc được không?

Trả lời: Cẩn thận. HPA scale horizontally (replicas), VPA scale vertically (requests/limits). Nếu cả hai dùng CPU metric → conflict. Giải pháp: VPA ở mode Off (chỉ recommend), HPA scale thật. Hoặc VPA dùng memory metric, HPA dùng custom metric.

Bài tiếp theo (Giai đoạn V): Quan sát và debug: events, logs, metrics, khi nào xem gì, ở đâu.