Kubernetes High Availability: HPA, Pod Disruption Budgets, Multi-AZ

Build resilient Kubernetes applications with Horizontal Pod Autoscaler, Pod Disruption Budgets, and multi-availability zone deployments for production workloads.

published: reading time: 17 min read

Kubernetes High Availability: Pod Disruption Budgets, HPA, and Multi-AZ Deployments

Production workloads need to stay available during node failures, cluster maintenance, and traffic spikes. Kubernetes provides mechanisms to handle these scenarios: Horizontal Pod Autoscaler (HPA) scales pods based on demand, Pod Disruption Budgets (PDB) ensure minimum availability during voluntary disruptions, and multi-AZ deployments protect against datacenter failures.

This post covers building resilient applications on Kubernetes using these tools and practices.

For Kubernetes basics, see the Kubernetes fundamentals post. For advanced scheduling, see the Advanced Pod Scheduling post.

When to Use / When Not to Use

HPA suits variable traffic well

Web APIs, user-facing services, anything where load is unpredictable. If you have metrics that correlate with demand, HPA can react faster than you can.

Custom metrics unlock more. Queue depth for worker systems, request latency for latency-sensitive services, business metrics like active users. The autoscaler scales on what matters for your system.

When PDBs matter

Stateful applications need PDBs because their failure modes are harsher. A database losing quorum mid-request corrupts data. Stateless services restart cleanly.

If you need to drain nodes for maintenance without service blips, PDBs are essential. Cluster upgrades require draining nodes, and without PDBs you can temporarily lose quorum for stateful workloads.

Multi-AZ when single-datacenter is not enough

If your SLA is 99.99%, a single AZ failure takes you offline. Multi-AZ deployments ensure an AZ outage is invisible to users.

Compliance sometimes demands it. Data residency regulations may require geographic separation.

HPA is not always the answer

Batch jobs have a beginning and an end. They scale up front, run, scale down. HPA oscillating during a long job wastes resources.

Some services have latency requirements that autoscale cannot satisfy. If scale-to-zero latency is unacceptable, keep minimum replicas running.

When HPA Makes Sense

Use HPA when: variable traffic, meaningful metrics available, automatic capacity management needed.

Use PDBs when: stateful apps, cluster maintenance without downtime, protecting critical services during upgrades.

Use multi-AZ when: SLA demands it, zone failures must be invisible, compliance requires it.

Skip HPA for: batch jobs, latency-critical services requiring fixed capacity.

HA Architecture Flow

flowchart LR
    User --> LB[Load Balancer]
    LB --> HPA[HPA: Scales pods<br/>based on metrics]
    HPA --> AZ1[AZ-1 Pod]
    HPA --> AZ2[AZ-2 Pod]
    HPA --> AZ3[AZ-3 Pod]
    subgraph PDB[Pod Disruption Budget]
        PDB1[minAvailable: 2]
    end
    AZ1 --> PDB1
    AZ2 --> PDB1
    AZ3 --> PDB1

HPA scales pods horizontally across availability zones. PDB ensures at least 2 replicas stay up during voluntary disruptions. Together they handle traffic spikes and cluster maintenance without downtime.

HPA Configuration and Scaling Behavior

The Horizontal Pod Autoscaler automatically adjusts the number of pod replicas based on CPU utilization, memory usage, or custom metrics.

Basic HPA configuration

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-frontend-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-frontend
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15

This HPA maintains 70% CPU utilization and 80% memory utilization. It scales between 3 and 20 replicas. The behavior section controls scaling speed:

  • Scale-down stabilization window of 300 seconds prevents rapid flapping
  • Scale-down limits to 10% of pods per minute
  • Scale-up allows doubling pods in 15 seconds for rapid response

Custom metrics HPA

For metrics beyond CPU and memory, use the custom.metrics.k8s.io API:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: queue-worker-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: queue-worker
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: External
      external:
        metric:
          name: queue_depth
          selector:
            matchLabels:
              queue_name: order-processing
        target:
          type: AverageValue
          averageValue: "100"

This scales based on message queue depth. When 100 messages per pod accumulate on average, the HPA adds more pods.

Checking HPA status

kubectl get hpa -n production
kubectl describe hpa web-frontend-hpa -n production

The describe output shows current metrics, replica counts, and scaling events.

HPA Scaling Policies Trade-off Table

Scaling PolicyBehaviorBest ForRisk
Aggressive (scale-up)Fast scale-up, slow scale-downTraffic spikes, flash salesOver-provisioning, higher costs
Conservative (scale-down)Slow scale-up, fast scale-downStable workloads, cost-sensitiveUnder-provisioning during growth
Stable (long stabilization window)Prevents flappingPredictable traffic, stateful appsSlower response to load changes
Mixed (separate up/down policies)Tune each direction independentlyMost production workloadsMore configuration complexity

The default HPA behavior is relatively aggressive on scale-up and conservative on scale-down. For stateful services with database connections, use a longer stabilization window to avoid connection churn during brief load fluctuations.

behavior:
  scaleUp:
    stabilizationWindowSeconds: 0 # Immediately scale up
    policies:
      - type: Percent
        value: 100
        periodSeconds: 15
  scaleDown:
    stabilizationWindowSeconds: 300 # 5 minute cooldown
    policies:
      - type: Percent
        value: 10
        periodSeconds: 60

Pod Disruption Budgets for Safe Evictions

Pod Disruption Budgets (PDB) ensure minimum availability during voluntary disruptions. Voluntary disruptions include node drain operations for cluster upgrades and autoscaler scale-down events.

PDB definition

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-frontend-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-frontend

This PDB ensures at least 2 web-frontend pods are available during disruptions. If 5 pods exist and you drain a node, Kubernetes evicts only 3 pods, leaving 2 running.

Using maxUnavailable instead

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-frontend-pdb
  namespace: production
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: web-frontend

maxUnavailable: 1 allows at most 1 pod to be unavailable. This is often clearer than minAvailable when you know your replica count.

Multiple PDBs for complex applications

# PDB for API servers
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
  namespace: production
spec:
  minAvailable: 3
  selector:
    matchLabels:
      tier: api
---
# PDB for frontend
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: frontend-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      tier: frontend

You can have multiple PDBs for different parts of an application.

Checking PDB status

kubectl get pdb -n production
kubectl describe pdb web-frontend-pdb -n production

Pod Priority and Preemption

Pod priority affects scheduling order and eviction decisions during resource pressure. Higher priority pods preempt lower priority pods when the cluster is full.

Priority class definition

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: production-high
value: 1000
globalDefault: false
description: "Production workloads with high priority"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: production-medium
value: 500
globalDefault: true
description: "Standard production workloads"

globalDefault: true means pods without an explicit priority class use production-medium by default.

Assigning priority to pods

spec:
  priorityClassName: production-high

Critical workloads like payment processing use high priority. Background batch jobs use lower priority and get preempted when needed.

Multi-AZ Deployment Strategies

Distributing pods across availability zones protects against single-datacenter failures. Kubernetes nodes typically run in multiple zones within a region.

Zone labels

Nodes have topology labels:

topology.kubernetes.io/zone: us-east-1a
topology.kubernetes.io/region: us-east-1

Pod anti-affinity for zone spreading

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: "app"
                operator: In
                values:
                  - web-frontend
          topologyKey: topology.kubernetes.io/zone

This spreads web-frontend pods across zones. If you have 3 replicas and 3 zones, each zone gets one pod.

Storage considerations

Persistent volumes with cloud provider storage may have zone constraints. EBS volumes exist in a single availability zone. If you schedule a pod in us-east-1b but the EBS volume is in us-east-1a, the pod cannot start.

Use volumeBindingMode: WaitForFirstConsumer in your StorageClass to delay volume binding until scheduler placement:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
volumeBindingMode: WaitForFirstConsumer

The scheduler then places the pod in the same zone as the volume.

StatefulSet with zone awareness

spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchLabels:
              app: postgres
          topologyKey: topology.kubernetes.io/zone
  template:
    spec:
      containers:
        - name: postgres
          image: postgres:15

StatefulSets with zone-spread requirements may fail to schedule all replicas if zones are unavailable. Consider whether your application can operate with reduced replica count.

Cluster Federation Basics

Federation manages multiple Kubernetes clusters as a single logical cluster. You deploy workloads to a federated control plane that distributes them across member clusters.

Federation v2 (KubeFed) provides:

  • Cross-cluster scheduling: Deploy pods to multiple clusters
  • Cross-cluster service discovery: Access services across clusters
  • Replica placement: Distribute workloads based on geography

Federation architecture

┌─────────────────────────────────────────┐
│           Federated Control Plane        │
│  ┌─────────────┐  ┌──────────────────┐  │
│  │ KubeFed     │  │ Federated API    │  │
│  └─────────────┘  └──────────────────┘  │
└─────────────────────────────────────────┘
        │                  │
        ▼                  ▼
┌──────────────┐   ┌──────────────┐
│ Cluster us-east-1 │  │ Cluster eu-west-1 │
└──────────────┘   └──────────────┘

Federation is complex and requires careful planning. For most use cases, simpler approaches like GitOps with multiple clusters work better.

Failure Simulation Testing

Testing failure scenarios validates your HA configuration. Tools like chaoskube and Litmus simulate failures to verify resilience.

Using chaoskube

helm install chaoskube chaoskube/chaoskube \
  --set namespaces={production} \
  --set schedule="*/5 * * * *" \
  --set replicas=1

Chaoskube kills a pod every 5 minutes in the production namespace. If your PDDis configured correctly, your application stays available.

Manual node drain testing

kubectl drain node node-1 --ignore-daemonsets --delete-emptydir-data --force

Draining a node simulates cluster maintenance. Verify:

  • PDBs are respected
  • Pods reschedule to other nodes
  • Application remains available behind its Service

Load testing

Verify HPA responds correctly under load:

kubectl run -it load-generator \
  --image=busybox \
  --restart=Never \
  -- /bin/sh -c "while true; do wget -q -O- http://web-frontend; done"

Monitor HPA behavior during the test:

watch kubectl get hpa -n production

Production Failure Scenarios

HPA Flapping

HPA that scales up and down too quickly is worse than no HPA. Pods getting created and destroyed constantly burn resources and generate logs.

This happens when the stabilization window is too short or the scaling threshold is too tight. A pod that scales to 20 replicas, triggers scale-down, bounces back, is not doing anyone favors.

Set stabilizationWindowSeconds on scale-down to give the system time to settle. Five minutes is usually enough.

PDB Blocking Cluster Upgrades

A PDB with minAvailable set to the replica count blocks all drains. If you have 5 pods and minAvailable: 5, no pod ever gets evicted.

The kubectl drain command hangs. Cluster upgrades stall. You get an error about no satisfying PDB.

Set minAvailable to what your service can actually tolerate being unavailable. For stateless services, percentage-based PDBs like minAvailable: 50% are more practical.

All Replicas in One AZ

If you deploy 3 replicas without any topology awareness and one AZ fails, all 3 replicas disappear simultaneously. Your service goes down even though you thought you had redundancy.

Use topologySpreadConstraints or podAntiAffinity with topology.kubernetes.io/zone. Explicitly distribute across zones.

Anti-Patterns

minReplicas: 1

A single replica has no availability during restarts, upgrades, or failures. The kubelet restarting alone takes down your service.

Set minReplicas to at least 2 for anything you care about.

PDBs Set Too Aggressively

A PDB requiring all replicas to stay up blocks cluster maintenance entirely. You cannot drain nodes, you cannot upgrade the cluster, you cannot rotate infrastructure.

Set PDBs to the minimum your service needs, not the maximum.

HPA Without PDB

During a node drain, HPA sees reduced replicas and tries to scale up. Without a PDB, the newly created pods can get evicted too, creating a thrashing situation.

Pair HPA with PDBs for any production workload.

Vertical Pod Autoscaler Integration

VPA adjusts pod resource requests automatically based on actual usage, complementing HPA which adjusts replica counts. Use VPA for workloads where CPU/memory sizing is tricky and you want the scheduler to optimize resource allocation.

VPA Recommendation Modes

ModeBehaviorUse Case
OffNo action, just shows recommendationsTesting VPA before enabling
InitialSets resources only at pod creationNew deployments
AutoUpdates resources dynamicallyMature workloads in staging
RecreateUpdates and evicts pods to applyWilling to tolerate restarts

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-backend
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 4
          memory: 8Gi

VPA with HPA

VPA handles vertical scaling (resource requests) while HPA handles horizontal scaling (replica count). They can run together:

# VPA for resource sizing
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-backend
  updatePolicy:
    updateMode: "Auto"
---
# HPA for replica scaling
apiVersion: autoscaling.k8s.io/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-backend
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

Note: When using VPA in Auto mode with HPA, set minReplicas high enough that VPA resource increases do not cause HPA to immediately scale down before the new resource limits take effect.

Observability Hooks for HA

Key Metrics to Track

# HPA scaling events
kube_horizontalpodautoscaler_status_condition{condition="AbleToScale"}

# HPA current vs desired replicas
kube_horizontalpodautoscaler_status_replicas / kube_horizontalpodautoscaler_spec_replicas

# PDB violations (pods blocked from eviction)
kube_poddisruptionbudget_status_disruptions

# Pod restarts by deployment (evictions vs crashes)
kube_pod_container_status_restarts_total

# Node availability by zone
kube_node_status_condition{condition="Ready"}

Key Events to Log

  • HPA scale-up and scale-down events (check kubectl get events --watch)
  • PDB eviction blocks (pod cannot be evicted due to minAvailable)
  • VPA resource recommendations applied
  • Pod preemption events (higher priority pods evict lower ones)

Key Alerts to Configure

AlertConditionSeverity
HPA at max replicaskube_horizontalpodautoscaler_status_replicas == kube_horizontalpodautoscaler_spec_max_replicas for >5minWarning
HPA flappingScale direction changes 4+ times in 10minWarning
PDB blocking evictionskube_poddisruptionbudget_status_disruptions > 0Critical
Pod preemption eventskube_pod_status_nominated_node_name != "" increasingInfo
VPA recommendations ignoredPod OOMKilled despite VPA recommendationsWarning

Debug Commands

# Check HPA current state
kubectl get hpa -n production

# Watch HPA scaling decisions
kubectl get events --watch --field-selector involvedObject.kind=HorizontalPodAutoscaler

# Check PDB status
kubectl get pdb -n production
kubectl describe pdb web-frontend-pdb -n production

# Check VPA recommendations (without applying)
kubectl get vpa -n production -o yaml | grep -A 50 recommendation

# Check for preemption events
kubectl get events --sort-by='.lastTimestamp' | grep -i preempt

Quick Recap Checklist

  • minReplicas set to at least 2 for production stateless services
  • PDB configured with minimum viable availability
  • topologySpreadConstraints distributing replicas across zones
  • HPA stabilization window preventing flapping
  • PDB behavior tested with simulated node drains in staging
  • HPA scaling on appropriate metrics (CPU for stateless, memory for caches)
  • HPA scaling events monitored in production
  • Application available during cluster upgrades verified

Interview Questions

Q: Your multi-AZ Kubernetes cluster lost one availability zone. Walk through what happens and how you recover. A: Nodes in the lost AZ are unreachable. The kube-controller-manager detects this and marks those nodes as Unknown, and the cloud provider controller deletes the instances. Pods that were running on those nodes are marked as Terminating and eventually Unknown. Kubernetes recreates pods on remaining nodes if the ReplicaSet or Deployment controller is healthy. Pods with PodDisruptionBudgets that were violated during the outage are recreated once the PDB allows. StatefulSets with volumes may not reschedule immediately since their PersistentVolumes in the lost AZ are unavailable — the StatefulSet controller waits for the volumes or for the pod to be force-deleted. Recovery steps: verify the cluster is healthy after the AZ returns, check PersistentVolume claims and manually delete any pods stuck in Terminating state (kubectl delete pod --grace-period=0 --force). Verify auto-scaling was triggered and pods are back to desired count.

Q: You have a 3-node etcd cluster and one node fails. What happens and how do you recover? A: With 3-node etcd, you can tolerate 1 member failure without cluster unavailability. The cluster continues serving requests with 2 nodes. The failed member’s log snapshot falls behind. Recovery depends on whether the data is recoverable: if the node comes back quickly, etcd automatically rejoins and catches up from the leader. If the node is permanently lost, you must remove it from the etcd cluster using etcdctl member remove, then add a new member using etcdctl member add. For managed Kubernetes (GKE, EKS), the control plane handles this automatically — the managed etcd redundancy is one of the main benefits of managed clusters. For self-managed: always run etcd on dedicated nodes with proper monitoring on etcd cluster health metrics.

Q: How do you design for zero-downtime upgrades of the Kubernetes control plane? A: For managed clusters, use the platform’s rolling upgrade feature — GKE, EKS, and AKS upgrade the control plane nodes without cluster downtime by using zero-downtime configurations. For self-managed: upgrade etcd first (since API server depends on it), then kube-apiserver, then controller-manager and scheduler. Use the kubectl cordon and drain approach on control plane nodes — treat them like worker nodes for upgrade purposes. Maintain etcd backups before any control plane changes using etcdctl snapshot save. The golden rule: never upgrade multiple control plane components simultaneously, and always verify each component is healthy before proceeding to the next. For single-node development clusters, downtime is unavoidable — schedule upgrades during maintenance windows.

Q: A pod with a PVC is stuck in Pending after a node was cordoned. Why? A: The pod’s PVC may be bound to a PersistentVolume in an unavailable zone, or the pod’s affinity/anti-affinity rules prevent it from scheduling elsewhere. Check kubectl describe pod for the pending reason — likely “persistentvolumeclaim not found” or “node(s) didn’t match pod affinity rules.” If the PVC uses a storage class with a WaitForFirstConsumer binding mode, the PVC is not created until a pod actually uses it, which can cause confusion. If the PV is zone-specific and the target zone is overcommitted or unavailable, the pod cannot schedule. Solutions: use a storage class that spans multiple zones (regional PDs in GKE, EBS gp3 with cross-AZ replication via a CSI driver), or use StatefulSets with anti-affinity rules that spread replicas across zones.

Conclusion

High availability on Kubernetes requires multiple layers of protection. HPA handles traffic spikes by scaling pods horizontally. Pod Disruption Budgets ensure minimum availability during cluster operations. Multi-AZ deployments protect against datacenter failures.

Configure HPA with appropriate min and max replica counts and tuning parameters for scale-up and scale-down behavior. Set PDBs for all production workloads. Distribute StatefulSets and Deployments across availability zones using pod anti-affinity rules.

Test your HA setup with chaos engineering tools. Simulate node failures, pod evictions, and traffic spikes to verify your configuration handles real-world scenarios.

For more on advanced Kubernetes patterns, see the Advanced Kubernetes post.

Category

Related Posts

Health Checks: Liveness, Readiness, and Service Availability

Master health check implementation for microservices including liveness probes, readiness probes, and graceful degradation patterns.

#microservices #health-checks #kubernetes

Container Security: Image Scanning and Vulnerability Management

Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.

#container-security #docker #kubernetes

Deployment Strategies: Rolling, Blue-Green, and Canary Releases

Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.

#deployment #devops #kubernetes