Phần 13: Requests, limits, QoS và OOM

Bài chuyên sâu: Kubernetes requests/limits: CPU throttling và OOMKilled đi chi tiết hơn về cơ chế kernel cgroup, cách đo metric, và chiến lược cho từng ngôn ngữ (Go, JVM, Node.js, Python).

Requests vs limits, hai con số, hai vai trò

	requests	limits
Ai dùng	Scheduler	Kernel (cgroup)
Mục đích	Đặt pod lên node nào	Giới hạn runtime
Vượt	Pod dùng nhiều hơn request nếu node rảnh	CPU: throttle. Memory: OOMKill

resources:
  requests:
    cpu: "200m" # Scheduler: cần 0.2 core
    memory: "256Mi" # Scheduler: cần 256 MiB
  limits:
    memory: "512Mi" # Kernel: kill nếu vượt 512 MiB
    # cpu limit: thường KHÔNG đặt (xem lý do bên dưới)

CPU throttling

CPU limit = kernel cgroup cpu.cfs_quota_us. Container dùng hết quota trong period (100ms) → bị pause đến period sau.

Triệu chứng: P99 latency spike, CPU util trung bình thấp, container_cpu_cfs_throttled_seconds_total tăng.

Khuyến nghị: đặt CPU request luôn, CPU limit thường không đặt (hoặc rất rộng). Lý do: CPU là compressible, thiếu CPU chỉ chậm, không crash. Scheduler + request đảm bảo fairness.

Ngoại lệ cần limit: multi-tenant cluster, batch workload cần predictable runtime.

# Phát hiện throttling
sum by (pod) (
  rate(container_cpu_cfs_throttled_seconds_total{pod=~"my-app-.*"}[5m])
)

Memory OOMKilled

Memory là incompressible, không thể “chạy chậm để dùng ít RAM”. Vượt limit → kernel cgroup OOM killer → SIGKILL (exit code 137).

kubectl describe pod my-app
# Last State: Terminated
#   Reason: OOMKilled
#   Exit Code: 137

# Debug:
# 1. Metric memory_working_set_bytes trước khi OOM
# 2. App có memory leak? Heap dump (Go pprof, JVM -XX:+HeapDumpOnOutOfMemoryError)
# 3. Tăng limit hay fix leak? Tăng limit mà leak → chỉ trì hoãn

Khuyến nghị: memory luôn đặt limit. Leak của 1 pod không tràn sang pod khác. limit ≈ request × 1.5–2.

QoS class

Class	Điều kiện	Eviction priority
Guaranteed	Mọi container: requests = limits (CPU + memory)	Cuối cùng (ưu tiên cao nhất)
Burstable	Có ít nhất 1 request, nhưng request ≠ limit	Giữa
BestEffort	Không có request lẫn limit	Đầu tiên (dễ bị evict nhất)

Ảnh hưởng thực tế

Khi node memory pressure (dưới evictionHard threshold):

BestEffort bị evict trước.
Burstable vượt requests bị evict tiếp.
Guaranteed cuối cùng.

Node OOM killer dùng oom_score_adj: BestEffort ~1000, Guaranteed -997.

Workload quan trọng (API chính) → Guaranteed. Background worker → Burstable chấp nhận được.

Node scheduling và capacity

Scheduler kiểm tra: sum(pod.requests) trên node ≤ node.allocatable.

# Xem allocatable
kubectl describe node <name> | grep -A6 Allocatable

# Xem đã dùng bao nhiêu
kubectl describe node <name> | grep -A6 "Allocated resources"

Pod Pending “Insufficient cpu” → sum requests trên tất cả node đã vượt allocatable. Giải pháp: (1) giảm requests; (2) scale node; (3) xem pod nào chiếm nhiều.

Request quá lớn so với node

Ví dụ: pod request 8Gi memory, node allocatable 7.5Gi → pod không bao giờ schedule. Kiểm tra Events.

Đặt giá trị dựa trên số liệu

Đừng đoán. Quy trình:

Đo baseline: chạy app ở load thực tế vài ngày.
Metric: container_memory_working_set_bytes P95, container_cpu_usage_seconds_total P95.
CPU request ≈ P95 CPU × 1.2.
Memory request ≈ P95 working set × 1.2.
Memory limit ≈ request × 1.5.
CPU limit: bỏ hoặc đặt 4× request.
Review sau 1-2 tuần với data mới.

VPA (Vertical Pod Autoscaler)

VPA đề xuất requests/limits dựa trên usage lịch sử. Dùng ở mode Off (chỉ recommend, không auto apply) cho production critical.

Checklist nhanh

Mọi container có requests (CPU + memory).
Memory có limits, không quá sát request.
CPU limit: không đặt (hoặc rất rộng), trừ khi multi-tenant.
QoS phù hợp: API chính → Guaranteed, worker → Burstable OK.
Dashboard có panel throttling + OOM events.
LimitRange + ResourceQuota cho namespace shared.
Review requests/limits mỗi quý.

Tóm tắt

requests cho scheduler + HPA; limits cho kernel cgroup.
CPU throttle gây latency spike, thường không nên đặt CPU limit.
Memory OOMKilled, luôn đặt limit; vượt = kill, không warning.
QoS class quyết eviction order: Guaranteed > Burstable > BestEffort.
Đặt giá trị từ metric thật, review định kỳ.

Chi tiết sâu hơn: Kubernetes requests/limits: CPU throttling và OOMKilled.

Câu hỏi hay gặp

Pod Guaranteed nhưng vẫn bị OOMKilled, tại sao?

Trả lời: Guaranteed chỉ ưu tiên khi node eviction (kubelet chọn ai evict). OOMKilled trong cgroup xảy ra khi container vượt chính limit của nó, QoS class không can thiệp. Tăng limit hoặc fix leak.

Requests 100m CPU nghĩa app chỉ dùng được 10% core?

Trả lời: Không. requests không limit runtime. 100m = scheduler đảm bảo ít nhất 10% core time. Nếu node rảnh, pod có thể dùng nhiều core hơn (trừ khi có CPU limit).

Nên đặt requests = limits cho tất cả pod?

Trả lời: Đặt bằng → QoS Guaranteed → ưu tiên eviction cao nhất. Nhưng mọi pod Guaranteed = cluster không overcommit, cần nhiều node hơn. Thực tế: API chính = Guaranteed, worker/batch = Burstable để tận dụng resource.

Bài tiếp theo (Giai đoạn V): HPA, PDB và an toàn rollout, auto-scale và disruption budget.