Kubernetes High Availability: HPA, Pod Disruption Budgets, Multi-AZ
Build resilient Kubernetes applications with Horizontal Pod Autoscaler, Pod Disruption Budgets, and multi-availability zone deployments for production workloads.
Kubernetes High Availability: Pod Disruption Budgets, HPA, and Multi-AZ Deployments
Production workloads need to stay available during node failures, cluster maintenance, and traffic spikes. Kubernetes provides mechanisms to handle these scenarios: Horizontal Pod Autoscaler (HPA) scales pods based on demand, Pod Disruption Budgets (PDB) ensure minimum availability during voluntary disruptions, and multi-AZ deployments protect against datacenter failures.
This post covers building resilient applications on Kubernetes using these tools and practices.
For Kubernetes basics, see the Kubernetes fundamentals post. For advanced scheduling, see the Advanced Pod Scheduling post.
When to Use / When Not to Use
HPA suits variable traffic well
Web APIs, user-facing services, anything where load is unpredictable. If you have metrics that correlate with demand, HPA can react faster than you can.
Custom metrics unlock more. Queue depth for worker systems, request latency for latency-sensitive services, business metrics like active users. The autoscaler scales on what matters for your system.
When PDBs matter
Stateful applications need PDBs because their failure modes are harsher. A database losing quorum mid-request corrupts data. Stateless services restart cleanly.
If you need to drain nodes for maintenance without service blips, PDBs are essential. Cluster upgrades require draining nodes, and without PDBs you can temporarily lose quorum for stateful workloads.
Multi-AZ when single-datacenter is not enough
If your SLA is 99.99%, a single AZ failure takes you offline. Multi-AZ deployments ensure an AZ outage is invisible to users.
Compliance sometimes demands it. Data residency regulations may require geographic separation.
HPA is not always the answer
Batch jobs have a beginning and an end. They scale up front, run, scale down. HPA oscillating during a long job wastes resources.
Some services have latency requirements that autoscale cannot satisfy. If scale-to-zero latency is unacceptable, keep minimum replicas running.
When HPA Makes Sense
Use HPA when: variable traffic, meaningful metrics available, automatic capacity management needed.
Use PDBs when: stateful apps, cluster maintenance without downtime, protecting critical services during upgrades.
Use multi-AZ when: SLA demands it, zone failures must be invisible, compliance requires it.
Skip HPA for: batch jobs, latency-critical services requiring fixed capacity.
HA Architecture Flow
flowchart LR
User --> LB[Load Balancer]
LB --> HPA[HPA: Scales pods<br/>based on metrics]
HPA --> AZ1[AZ-1 Pod]
HPA --> AZ2[AZ-2 Pod]
HPA --> AZ3[AZ-3 Pod]
subgraph PDB[Pod Disruption Budget]
PDB1[minAvailable: 2]
end
AZ1 --> PDB1
AZ2 --> PDB1
AZ3 --> PDB1
HPA scales pods horizontally across availability zones. PDB ensures at least 2 replicas stay up during voluntary disruptions. Together they handle traffic spikes and cluster maintenance without downtime.
HPA Configuration and Scaling Behavior
The Horizontal Pod Autoscaler automatically adjusts the number of pod replicas based on CPU utilization, memory usage, or custom metrics.
Basic HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-frontend-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-frontend
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
This HPA maintains 70% CPU utilization and 80% memory utilization. It scales between 3 and 20 replicas. The behavior section controls scaling speed:
- Scale-down stabilization window of 300 seconds prevents rapid flapping
- Scale-down limits to 10% of pods per minute
- Scale-up allows doubling pods in 15 seconds for rapid response
Custom metrics HPA
For metrics beyond CPU and memory, use the custom.metrics.k8s.io API:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: queue-worker-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: queue-worker
minReplicas: 2
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: queue_depth
selector:
matchLabels:
queue_name: order-processing
target:
type: AverageValue
averageValue: "100"
This scales based on message queue depth. When 100 messages per pod accumulate on average, the HPA adds more pods.
Checking HPA status
kubectl get hpa -n production
kubectl describe hpa web-frontend-hpa -n production
The describe output shows current metrics, replica counts, and scaling events.
HPA Scaling Policies Trade-off Table
| Scaling Policy | Behavior | Best For | Risk |
|---|---|---|---|
| Aggressive (scale-up) | Fast scale-up, slow scale-down | Traffic spikes, flash sales | Over-provisioning, higher costs |
| Conservative (scale-down) | Slow scale-up, fast scale-down | Stable workloads, cost-sensitive | Under-provisioning during growth |
| Stable (long stabilization window) | Prevents flapping | Predictable traffic, stateful apps | Slower response to load changes |
| Mixed (separate up/down policies) | Tune each direction independently | Most production workloads | More configuration complexity |
The default HPA behavior is relatively aggressive on scale-up and conservative on scale-down. For stateful services with database connections, use a longer stabilization window to avoid connection churn during brief load fluctuations.
behavior:
scaleUp:
stabilizationWindowSeconds: 0 # Immediately scale up
policies:
- type: Percent
value: 100
periodSeconds: 15
scaleDown:
stabilizationWindowSeconds: 300 # 5 minute cooldown
policies:
- type: Percent
value: 10
periodSeconds: 60
Pod Disruption Budgets for Safe Evictions
Pod Disruption Budgets (PDB) ensure minimum availability during voluntary disruptions. Voluntary disruptions include node drain operations for cluster upgrades and autoscaler scale-down events.
PDB definition
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-frontend-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: web-frontend
This PDB ensures at least 2 web-frontend pods are available during disruptions. If 5 pods exist and you drain a node, Kubernetes evicts only 3 pods, leaving 2 running.
Using maxUnavailable instead
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-frontend-pdb
namespace: production
spec:
maxUnavailable: 1
selector:
matchLabels:
app: web-frontend
maxUnavailable: 1 allows at most 1 pod to be unavailable. This is often clearer than minAvailable when you know your replica count.
Multiple PDBs for complex applications
# PDB for API servers
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
namespace: production
spec:
minAvailable: 3
selector:
matchLabels:
tier: api
---
# PDB for frontend
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: frontend-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
tier: frontend
You can have multiple PDBs for different parts of an application.
Checking PDB status
kubectl get pdb -n production
kubectl describe pdb web-frontend-pdb -n production
Pod Priority and Preemption
Pod priority affects scheduling order and eviction decisions during resource pressure. Higher priority pods preempt lower priority pods when the cluster is full.
Priority class definition
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: production-high
value: 1000
globalDefault: false
description: "Production workloads with high priority"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: production-medium
value: 500
globalDefault: true
description: "Standard production workloads"
globalDefault: true means pods without an explicit priority class use production-medium by default.
Assigning priority to pods
spec:
priorityClassName: production-high
Critical workloads like payment processing use high priority. Background batch jobs use lower priority and get preempted when needed.
Multi-AZ Deployment Strategies
Distributing pods across availability zones protects against single-datacenter failures. Kubernetes nodes typically run in multiple zones within a region.
Zone labels
Nodes have topology labels:
topology.kubernetes.io/zone: us-east-1a
topology.kubernetes.io/region: us-east-1
Pod anti-affinity for zone spreading
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- web-frontend
topologyKey: topology.kubernetes.io/zone
This spreads web-frontend pods across zones. If you have 3 replicas and 3 zones, each zone gets one pod.
Storage considerations
Persistent volumes with cloud provider storage may have zone constraints. EBS volumes exist in a single availability zone. If you schedule a pod in us-east-1b but the EBS volume is in us-east-1a, the pod cannot start.
Use volumeBindingMode: WaitForFirstConsumer in your StorageClass to delay volume binding until scheduler placement:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
parameters:
type: gp3
volumeBindingMode: WaitForFirstConsumer
The scheduler then places the pod in the same zone as the volume.
StatefulSet with zone awareness
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: postgres
topologyKey: topology.kubernetes.io/zone
template:
spec:
containers:
- name: postgres
image: postgres:15
StatefulSets with zone-spread requirements may fail to schedule all replicas if zones are unavailable. Consider whether your application can operate with reduced replica count.
Cluster Federation Basics
Federation manages multiple Kubernetes clusters as a single logical cluster. You deploy workloads to a federated control plane that distributes them across member clusters.
Federation v2 (KubeFed) provides:
- Cross-cluster scheduling: Deploy pods to multiple clusters
- Cross-cluster service discovery: Access services across clusters
- Replica placement: Distribute workloads based on geography
Federation architecture
┌─────────────────────────────────────────┐
│ Federated Control Plane │
│ ┌─────────────┐ ┌──────────────────┐ │
│ │ KubeFed │ │ Federated API │ │
│ └─────────────┘ └──────────────────┘ │
└─────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ Cluster us-east-1 │ │ Cluster eu-west-1 │
└──────────────┘ └──────────────┘
Federation is complex and requires careful planning. For most use cases, simpler approaches like GitOps with multiple clusters work better.
Failure Simulation Testing
Testing failure scenarios validates your HA configuration. Tools like chaoskube and Litmus simulate failures to verify resilience.
Using chaoskube
helm install chaoskube chaoskube/chaoskube \
--set namespaces={production} \
--set schedule="*/5 * * * *" \
--set replicas=1
Chaoskube kills a pod every 5 minutes in the production namespace. If your PDDis configured correctly, your application stays available.
Manual node drain testing
kubectl drain node node-1 --ignore-daemonsets --delete-emptydir-data --force
Draining a node simulates cluster maintenance. Verify:
- PDBs are respected
- Pods reschedule to other nodes
- Application remains available behind its Service
Load testing
Verify HPA responds correctly under load:
kubectl run -it load-generator \
--image=busybox \
--restart=Never \
-- /bin/sh -c "while true; do wget -q -O- http://web-frontend; done"
Monitor HPA behavior during the test:
watch kubectl get hpa -n production
Production Failure Scenarios
HPA Flapping
HPA that scales up and down too quickly is worse than no HPA. Pods getting created and destroyed constantly burn resources and generate logs.
This happens when the stabilization window is too short or the scaling threshold is too tight. A pod that scales to 20 replicas, triggers scale-down, bounces back, is not doing anyone favors.
Set stabilizationWindowSeconds on scale-down to give the system time to settle. Five minutes is usually enough.
PDB Blocking Cluster Upgrades
A PDB with minAvailable set to the replica count blocks all drains. If you have 5 pods and minAvailable: 5, no pod ever gets evicted.
The kubectl drain command hangs. Cluster upgrades stall. You get an error about no satisfying PDB.
Set minAvailable to what your service can actually tolerate being unavailable. For stateless services, percentage-based PDBs like minAvailable: 50% are more practical.
All Replicas in One AZ
If you deploy 3 replicas without any topology awareness and one AZ fails, all 3 replicas disappear simultaneously. Your service goes down even though you thought you had redundancy.
Use topologySpreadConstraints or podAntiAffinity with topology.kubernetes.io/zone. Explicitly distribute across zones.
Anti-Patterns
minReplicas: 1
A single replica has no availability during restarts, upgrades, or failures. The kubelet restarting alone takes down your service.
Set minReplicas to at least 2 for anything you care about.
PDBs Set Too Aggressively
A PDB requiring all replicas to stay up blocks cluster maintenance entirely. You cannot drain nodes, you cannot upgrade the cluster, you cannot rotate infrastructure.
Set PDBs to the minimum your service needs, not the maximum.
HPA Without PDB
During a node drain, HPA sees reduced replicas and tries to scale up. Without a PDB, the newly created pods can get evicted too, creating a thrashing situation.
Pair HPA with PDBs for any production workload.
Vertical Pod Autoscaler Integration
VPA adjusts pod resource requests automatically based on actual usage, complementing HPA which adjusts replica counts. Use VPA for workloads where CPU/memory sizing is tricky and you want the scheduler to optimize resource allocation.
VPA Recommendation Modes
| Mode | Behavior | Use Case |
|---|---|---|
| Off | No action, just shows recommendations | Testing VPA before enabling |
| Initial | Sets resources only at pod creation | New deployments |
| Auto | Updates resources dynamically | Mature workloads in staging |
| Recreate | Updates and evicts pods to apply | Willing to tolerate restarts |
VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-backend
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 4
memory: 8Gi
VPA with HPA
VPA handles vertical scaling (resource requests) while HPA handles horizontal scaling (replica count). They can run together:
# VPA for resource sizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-backend
updatePolicy:
updateMode: "Auto"
---
# HPA for replica scaling
apiVersion: autoscaling.k8s.io/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-backend
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Note: When using VPA in Auto mode with HPA, set minReplicas high enough that VPA resource increases do not cause HPA to immediately scale down before the new resource limits take effect.
Observability Hooks for HA
Key Metrics to Track
# HPA scaling events
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}
# HPA current vs desired replicas
kube_horizontalpodautoscaler_status_replicas / kube_horizontalpodautoscaler_spec_replicas
# PDB violations (pods blocked from eviction)
kube_poddisruptionbudget_status_disruptions
# Pod restarts by deployment (evictions vs crashes)
kube_pod_container_status_restarts_total
# Node availability by zone
kube_node_status_condition{condition="Ready"}
Key Events to Log
- HPA scale-up and scale-down events (check
kubectl get events --watch) - PDB eviction blocks (pod cannot be evicted due to minAvailable)
- VPA resource recommendations applied
- Pod preemption events (higher priority pods evict lower ones)
Key Alerts to Configure
| Alert | Condition | Severity |
|---|---|---|
| HPA at max replicas | kube_horizontalpodautoscaler_status_replicas == kube_horizontalpodautoscaler_spec_max_replicas for >5min | Warning |
| HPA flapping | Scale direction changes 4+ times in 10min | Warning |
| PDB blocking evictions | kube_poddisruptionbudget_status_disruptions > 0 | Critical |
| Pod preemption events | kube_pod_status_nominated_node_name != "" increasing | Info |
| VPA recommendations ignored | Pod OOMKilled despite VPA recommendations | Warning |
Debug Commands
# Check HPA current state
kubectl get hpa -n production
# Watch HPA scaling decisions
kubectl get events --watch --field-selector involvedObject.kind=HorizontalPodAutoscaler
# Check PDB status
kubectl get pdb -n production
kubectl describe pdb web-frontend-pdb -n production
# Check VPA recommendations (without applying)
kubectl get vpa -n production -o yaml | grep -A 50 recommendation
# Check for preemption events
kubectl get events --sort-by='.lastTimestamp' | grep -i preempt
Quick Recap Checklist
- minReplicas set to at least 2 for production stateless services
- PDB configured with minimum viable availability
- topologySpreadConstraints distributing replicas across zones
- HPA stabilization window preventing flapping
- PDB behavior tested with simulated node drains in staging
- HPA scaling on appropriate metrics (CPU for stateless, memory for caches)
- HPA scaling events monitored in production
- Application available during cluster upgrades verified
Interview Questions
Q: Your multi-AZ Kubernetes cluster lost one availability zone. Walk through what happens and how you recover.
A: Nodes in the lost AZ are unreachable. The kube-controller-manager detects this and marks those nodes as Unknown, and the cloud provider controller deletes the instances. Pods that were running on those nodes are marked as Terminating and eventually Unknown. Kubernetes recreates pods on remaining nodes if the ReplicaSet or Deployment controller is healthy. Pods with PodDisruptionBudgets that were violated during the outage are recreated once the PDB allows. StatefulSets with volumes may not reschedule immediately since their PersistentVolumes in the lost AZ are unavailable — the StatefulSet controller waits for the volumes or for the pod to be force-deleted. Recovery steps: verify the cluster is healthy after the AZ returns, check PersistentVolume claims and manually delete any pods stuck in Terminating state (kubectl delete pod --grace-period=0 --force). Verify auto-scaling was triggered and pods are back to desired count.
Q: You have a 3-node etcd cluster and one node fails. What happens and how do you recover?
A: With 3-node etcd, you can tolerate 1 member failure without cluster unavailability. The cluster continues serving requests with 2 nodes. The failed member’s log snapshot falls behind. Recovery depends on whether the data is recoverable: if the node comes back quickly, etcd automatically rejoins and catches up from the leader. If the node is permanently lost, you must remove it from the etcd cluster using etcdctl member remove, then add a new member using etcdctl member add. For managed Kubernetes (GKE, EKS), the control plane handles this automatically — the managed etcd redundancy is one of the main benefits of managed clusters. For self-managed: always run etcd on dedicated nodes with proper monitoring on etcd cluster health metrics.
Q: How do you design for zero-downtime upgrades of the Kubernetes control plane?
A: For managed clusters, use the platform’s rolling upgrade feature — GKE, EKS, and AKS upgrade the control plane nodes without cluster downtime by using zero-downtime configurations. For self-managed: upgrade etcd first (since API server depends on it), then kube-apiserver, then controller-manager and scheduler. Use the kubectl cordon and drain approach on control plane nodes — treat them like worker nodes for upgrade purposes. Maintain etcd backups before any control plane changes using etcdctl snapshot save. The golden rule: never upgrade multiple control plane components simultaneously, and always verify each component is healthy before proceeding to the next. For single-node development clusters, downtime is unavoidable — schedule upgrades during maintenance windows.
Q: A pod with a PVC is stuck in Pending after a node was cordoned. Why?
A: The pod’s PVC may be bound to a PersistentVolume in an unavailable zone, or the pod’s affinity/anti-affinity rules prevent it from scheduling elsewhere. Check kubectl describe pod for the pending reason — likely “persistentvolumeclaim not found” or “node(s) didn’t match pod affinity rules.” If the PVC uses a storage class with a WaitForFirstConsumer binding mode, the PVC is not created until a pod actually uses it, which can cause confusion. If the PV is zone-specific and the target zone is overcommitted or unavailable, the pod cannot schedule. Solutions: use a storage class that spans multiple zones (regional PDs in GKE, EBS gp3 with cross-AZ replication via a CSI driver), or use StatefulSets with anti-affinity rules that spread replicas across zones.
Conclusion
High availability on Kubernetes requires multiple layers of protection. HPA handles traffic spikes by scaling pods horizontally. Pod Disruption Budgets ensure minimum availability during cluster operations. Multi-AZ deployments protect against datacenter failures.
Configure HPA with appropriate min and max replica counts and tuning parameters for scale-up and scale-down behavior. Set PDBs for all production workloads. Distribute StatefulSets and Deployments across availability zones using pod anti-affinity rules.
Test your HA setup with chaos engineering tools. Simulate node failures, pod evictions, and traffic spikes to verify your configuration handles real-world scenarios.
For more on advanced Kubernetes patterns, see the Advanced Kubernetes post.
Category
Related Posts
Health Checks: Liveness, Readiness, and Service Availability
Master health check implementation for microservices including liveness probes, readiness probes, and graceful degradation patterns.
Container Security: Image Scanning and Vulnerability Management
Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.
Deployment Strategies: Rolling, Blue-Green, and Canary Releases
Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.