Kubernetes Resource Limits: CPU, Memory, and Quality of Service
Configure CPU and memory requests and limits to ensure fair scheduling, prevent resource starvation, and achieve predictable performance in Kubernetes clusters.
Kubernetes Resource Limits: CPU, Memory, and Quality of Service
Kubernetes clusters have finite resources. Nodes have CPU and memory that pods consume. Without resource constraints, one pod can starve others, degrade cluster stability, and make scheduling unpredictable. Kubernetes provides mechanisms to declare resource requirements, set hard limits, and classify pods by their importance.
This post explains requests vs limits, Quality of Service classes, and the namespace-level constraints that keep clusters healthy.
If you need Kubernetes fundamentals first, see the Kubernetes fundamentals post. For advanced scheduling patterns, see the Advanced Kubernetes post.
Requests vs Limits Explained
Every container in a pod can specify resource requests and resource limits:
apiVersion: v1
kind: Pod
metadata:
name: web-app
namespace: production
spec:
containers:
- name: web-app
image: nginx:1.25
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Requests define what a container needs. The scheduler uses requests to decide which node to place the pod on. A node must have at least as much allocatable resources as the pod’s requests.
Limits define the maximum resources a container can use. When a container hits its memory limit, Kubernetes terminates and restarts it. When it hits its CPU limit, Kubernetes throttles the container.
CPU representation
CPU is measured in cores. You can express it as a whole number (1 CPU = 1 core) or millicores (1000m = 1 CPU). Common values:
100m= 0.1 CPU (one-tenth of a core)250m= 0.25 CPU500m= 0.5 CPU1000m= 1 CPU
CPU is compressible. If a container exceeds its CPU limit, Kubernetes throttles it. The container does not get killed for CPU alone.
Memory representation
Memory is measured in bytes. You can use suffixes: Ki (kibibytes), Mi (mebibytes), Gi (gibibytes).
128Mi= 128 mebibytes (~134 MB)256Mi= 256 mebibytes (~268 MB)1Gi= 1 gibibyte (~1.07 GB)
Memory is not compressible. If a container exceeds its memory limit, Kubernetes terminates it with an OOMKilled status.
What happens when limits are exceeded
State: Terminated
Reason: OOMKilled
Exit Code: 137
Exit code 137 indicates the container was killed by the OOM (Out of Memory) killer. Frequent OOMKilled pods indicate you need to increase memory limits or optimize the application’s memory usage.
QoS Classes
Kubernetes assigns pods to Quality of Service classes based on their resource requests and limits:
| QoS Class | Criteria | Behavior |
|---|---|---|
| Guaranteed | requests == limits for all containers | Last to be evicted |
| Burstable | requests < limits (or some limits not set) | Evicted after Guaranteed |
| BestEffort | No requests or limits set | First to be evicted |
When to Use Each QoS Class
| QoS Class | When to Use | When NOT to Use |
|---|---|---|
| Guaranteed | Critical infrastructure pods, databases, licensed software with strict resource needs | When you do not need eviction priority guarantees |
| Burstable | Most application workloads that benefit from burst capacity | When every pod needs identical resource guarantees |
| BestEffort | Batch jobs, environments where pods are truly disposable | Production workloads, anything that matters |
Rule of thumb: Most workloads should be Burstable. Set Guaranteed for workloads that must not be evicted under any circumstances. Never run production workloads as BestEffort.
QoS Decision Flow
flowchart TD
A[Pod submitted] --> B{Requests == Limits\nfor all containers?}
B -->|Yes| G[Guaranteed QoS]
B -->|No| C{Any requests\nor limits set?}
C -->|Yes| Bu[Burstable QoS]
C -->|No| BE[BestEffort QoS]
Bu --> D{Node under\nmemory pressure?}
BE --> D
G --> D
D -->|Guaranteed| L1[Last to evict]
D -->|Burstable| L2[Middle eviction]
D -->|BestEffort| L3[First to evict]
Guaranteed pods
containers:
- name: database
image: postgres:15
resources:
limits:
memory: "2Gi"
cpu: "2000m"
requests:
memory: "2Gi"
cpu: "2000m"
Pods with identical requests and limits get the highest QoS. Use this for critical workloads that should not be evicted.
BestEffort pods
containers:
- name: batch-job
image: my-batch-job
resources: {}
No resource specifications means BestEffort. These pods are first in line for eviction when the node runs low on resources.
Burstable pods
containers:
- name: web-app
image: nginx:1.25
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
Most pods fall into Burstable. They have some guaranteed resources but can burst above their requests when available.
LimitRange for Namespace Quotas
A LimitRange sets default, minimum, and maximum resource limits for pods and containers in a namespace. Without it, pods without resource specs become BestEffort.
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- max:
memory: "4Gi"
cpu: "2000m"
min:
memory: "64Mi"
cpu: "50m"
default:
memory: "256Mi"
cpu: "250m"
defaultRequest:
memory: "128Mi"
cpu: "100m"
type: Container
This LimitRange:
- Sets maximum memory and CPU per container
- Sets minimum memory and CPU per container
- Applies default limits when containers specify no limits
- Applies default requests when containers specify no requests
Without this LimitRange, a container with no resource specs gets no guaranteed resources and can be evicted first.
Applying LimitRange
kubectl apply -f limitrange.yaml
kubectl describe limitrange default-limits -n production
The output shows the actual limits applied:
Type Resource Min Max Default Request Default Limit
Container cpu 50m 2 100m 250m
Container memory 64Mi 4Gi 128Mi 256Mi
ResourceQuota for Cluster-Wide Limits
A ResourceQuota limits total resource consumption in a namespace. Use it to prevent any single namespace from consuming all cluster resources.
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"
pods: "50"
persistentvolumeclaims: "10"
This quota limits the entire production namespace to 10 CPU requests, 20Gi memory requests, 20 CPU limits, 40Gi memory limits, 50 pods, and 10 persistent volume claims.
Viewing quota usage
kubectl describe resourcequota production-quota -n production
The output shows current usage against the hard limits. When a quota is exhausted, Kubernetes rejects new resource creation in that namespace.
Pod Resource Testing and Tuning
Finding the right requests and limits takes measurement. Kubernetes lets you profile pod behavior before setting values in production.
kubectl run with resource specs
kubectl run -it --rm load-generator \
--image=busybox \
--restart=Never \
-- requests.cpu=500m \
-- requests.memory=128Mi \
--limits.cpu=1000m \
--limits.memory=256Mi \
-- sh
Use this to run temporary pods and observe resource consumption with monitoring tools.
Vertical Pod Autoscaler
The Vertical Pod Autoscaler (VPA) analyzes historical resource usage and recommends or automatically applies better resource requests:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
In Auto mode, VPA evicts and reschedules pods with updated resource specs. In Off mode, it only provides recommendations.
VPA helps you find baseline resource requirements without manual profiling.
Horizontal Pod Autoscaler
The Horizontal Pod Autoscaler (HPA) scales pod replicas based on CPU, memory, or custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
The HPA scales between 3 and 10 replicas to maintain 70% average CPU utilization. For memory-based scaling:
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Conclusion
Resource requests and limits are essential for stable Kubernetes clusters. Requests tell the scheduler what a pod needs. Limits cap what a pod can consume. QoS classes determine eviction order when nodes run out of resources.
Use LimitRange to enforce defaults and prevent BestEffort pods from consuming everything. Use ResourceQuota to cap total namespace consumption. Profile your applications with VPA or manual testing before setting production resource specs.
Balanced resource configuration leads to predictable scheduling, stable node behavior, and fair resource sharing across workloads. For more on scheduling and high availability, see the Advanced Kubernetes post.
Production Failure Scenarios
BestEffort Pods Evicted Under Memory Pressure
Pods without resource requests (BestEffort QoS) are the first to be evicted when a node runs low on memory. In a cluster with many BestEffort pods, eviction events can be frequent.
Symptoms: Frequent pod evictions in kubectl get events, pods restart unexpectedly.
Mitigation: Set resource requests for all production pods. Use LimitRange to enforce minimum resource requests per namespace.
OOMKilled Pods from Memory Limit Too Low
When a container exceeds its memory limit, Kubernetes kills it with OOMKilled status. This is one of the most common production issues I see. The fix is almost always raising the memory limit.
Check actual memory usage first:
kubectl describe pod <pod-name> # Look for OOMKilled in state
kubectl top pod <pod-name> # Check actual memory usage
Set limits about 20-30% above what you see in staging to account for traffic spikes.
CPU Throttling Impacting Latency
CPU limits throttle containers even when the node has free CPU. For latency-sensitive services, this creates annoying tail latency spikes.
High p99 latency with low average CPU usage usually means CPU throttling. For these workloads, either remove CPU limits or set them equal to requests for a Guaranteed QoS pod.
Anti-Patterns
Setting Identical Requests and Limits for All Pods
Treating every pod the same wastes resources. A web server handling 1000 req/s has different needs than a batch job processing queues.
Profile each application type separately and set appropriate resource specs.
Not Setting Memory Limits
Memory limits prevent runaway processes from consuming all node memory and causing node-level OOM events. Always set memory limits, especially for applications that can experience memory leaks.
Over-Provisioning CPU Limits
Setting CPU limits very high (like 4 cores for a simple web server) defeats the purpose of limits. The scheduler uses requests for node allocation decisions, not limits.
Set CPU limits based on actual expected peak load, not theoretical maximum.
Quick Recap Checklist
Use this checklist when configuring Kubernetes resource limits:
- Set resource requests for all production containers (scheduler uses these for placement)
- Set memory limits to prevent runaway processes from impacting node stability
- Set CPU limits based on actual application needs, not maximum theoretical values
- Used LimitRange to enforce default requests/limits per namespace
- Used ResourceQuota to cap total namespace resource consumption
- Profile applications with VPA or monitoring tools before setting production values
- Monitor OOMKilled events and adjust memory limits accordingly
- Monitor CPU throttling metrics and adjust CPU limits if latency issues appear
- Set Guaranteed QoS (requests == limits) for critical infrastructure pods
- Avoid BestEffort for any production workload
Category
Related Posts
Container Security: Image Scanning and Vulnerability Management
Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.
Deployment Strategies: Rolling, Blue-Green, and Canary Releases
Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.
Developing Helm Charts: Templates, Values, and Testing
Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.