Kubernetes: Container Orchestration for Microservices
Learn Kubernetes fundamentals: pods, services, deployments, ingress controllers, Helm charts, autoscaling, and microservices architecture patterns.
Kubernetes: Container Orchestration for Microservices
Kubernetes has become the default platform for running containers in production. It handles scheduling, scaling, networking, and failure recovery so you do not have to. If you run microservices at scale, Kubernetes is worth understanding.
This post covers the core concepts: pods, services, deployments, ingress, Helm, and autoscaling. By the end, you will have a mental model for how these pieces fit together.
What is Kubernetes
Kubernetes (K8s) is an open-source container orchestration platform. It manages where containers run across a cluster of machines, keeps them healthy, scales them based on load, and handles networking between them.
You define the desired state in configuration files. Kubernetes continuously works to make reality match your desired state. If a container crashes, Kubernetes restarts it. If a node fails, Kubernetes reschedules its containers elsewhere.
graph TD
subgraph Cluster
subgraph Node1
P1[Pod] --> P2[Pod]
end
subgraph Node2
P3[Pod] --> P4[Pod]
end
subgraph Node3
P5[Pod]
end
end
K8s[Kubernetes Control Plane] --> Node1
K8s --> Node2
K8s --> Node3
The control plane makes scheduling decisions, manages node membership, and exposes the API you interact with. Nodes run the actual workloads.
Core Concepts
Pods
A pod is the smallest deployable unit in Kubernetes. A pod represents a single instance of a running process. It may contain one or more containers that share network and storage.
Most of the time, you run one container per pod. Some patterns use sidecars (helper containers in the same pod) for logging, proxying, or synchronization.
apiVersion: v1
kind: Pod
metadata:
name: api-server
spec:
containers:
- name: api
image: my-api:v1
ports:
- containerPort: 8080
Services
A service provides a stable network endpoint for a set of pods. Pods are ephemeral; their IPs change when they restart. Services abstract this away.
apiVersion: v1
kind: Service
metadata:
name: api-service
spec:
selector:
app: api-server
ports:
- port: 80
targetPort: 8080
type: ClusterIP
The four service types:
- ClusterIP: Internal cluster IP (default). Only reachable from within the cluster.
- NodePort: Exposes the service on each node’s IP at a static port.
- LoadBalancer: Creates an external load balancer (in cloud environments).
- Headless: No cluster IP. DNS returns pod IPs directly.
Deployments
A deployment manages replicas of a pod. It handles rolling updates and rollbacks. You declare how many replicas you want, and Kubernetes maintains that count.
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: my-api:v1
If you update the image version, the deployment rolls out the change gradually, replacing pods one by one to maintain availability.
ReplicaSets
ReplicaSets ensure a specified number of pod replicas are running. Deployments manage ReplicaSets. You usually work with Deployments, not ReplicaSets directly.
Networking
Kubernetes networking has a few important rules:
- Every pod gets a unique IP across the cluster
- Containers within a pod share that IP
- Pods can communicate with all other pods without NAT
- Services get a stable virtual IP that load-balances to pods
graph LR
PodA[Pod A] --> Svc[Service]
Svc --> PodB[Pod B]
Svc --> PodC[Pod C]
The service selector matches pod labels. When you call the service IP, Kubernetes load-balances across all matching pods.
Ingress
Ingress manages external HTTP/HTTPS access to services within the cluster. It provides routing based on host and path, SSL termination, and name-based virtual hosting.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
spec:
rules:
- host: api.example.com
http:
paths:
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 80
- path: /products
pathType: Prefix
backend:
service:
name: product-service
port:
number: 80
An ingress controller (like nginx-ingress or Istio gateway) implements the Ingress resource. Without a controller, Ingress resources do nothing.
Helm: Kubernetes Package Manager
Helm templatizes Kubernetes manifests. Instead of repeating YAML for each environment, you define templates with placeholders. A values file fills in the placeholders for dev, staging, and production.
# Install a chart
helm install my-release bitnami/wordpress
# Upgrade
helm upgrade my-release bitnami/wordpress --set resources.limits.memory=2Gi
# Rollback
helm rollback my-release 1
Helm charts package everything needed to run an application: manifests, templates, default values, and metadata.
Scaling
Kubernetes scales workloads in two directions: horizontally (more pod replicas) and vertically (more resources per pod).
Horizontal Pod Autoscaler
The HPA scales the number of pod replicas based on CPU utilization or custom metrics.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
When average CPU across replicas exceeds 70%, Kubernetes adds replicas up to the maximum. When it drops, it removes replicas down to the minimum.
Cluster Autoscaler
The cluster autoscaler adjusts the number of nodes in a cluster. When pods cannot be scheduled due to resource shortages, the cluster autoscaler adds nodes. When nodes are underutilized, it removes them.
ConfigMaps and Secrets
Configuration data lives in ConfigMaps. Sensitive data lives in Secrets. Both inject into containers as environment variables or files.
apiVersion: v1
kind: ConfigMap
metadata:
name: api-config
data:
DATABASE_HOST: "db.example.com"
LOG_LEVEL: "info"
apiVersion: v1
kind: Secret
metadata:
name: api-secrets
type: Opaque
stringData:
DATABASE_PASSWORD: "supersecret"
Inject into pods via environment variables or mounted volumes. Secrets are base64 encoded, not encrypted by default. For real secrets, use a secrets manager integration (Vault, AWS Secrets Manager, GCP Secret Manager).
Namespaces
Namespaces partition a Kubernetes cluster into virtual clusters. They provide scope for names, resource quotas, and access control.
kubectl get namespaces
kubectl create namespace my-app
kubectl config set-context --current --namespace=my-app
Common namespaces: default, kube-system (cluster components), kube-public (publicly readable resources).
Resource Management
Pods consume CPU and memory. You set resource requests (the minimum guaranteed) and limits (the maximum allowed).
resources:
requests:
memory: "128Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
If a pod exceeds its memory limit, it gets OOM-killed. If it exceeds its CPU limit, it is throttled.
Microservices on Kubernetes
Kubernetes works well for microservices because each service runs in its own deployment with independent scaling. Services communicate over the internal network via services. Deployments handle rolling updates without downtime. Namespaces isolate teams and environments. RBAC controls access between services.
A service mesh like Istio adds mTLS, traffic routing, and observability on top. See Service Mesh and Istio and Envoy for how they combine with Kubernetes.
When to Use / When Not to Use Kubernetes
Use Kubernetes when:
- You are running multiple services that need independent scaling and deployment
- You need automated failure recovery (self-healing for containerized workloads)
- You require traffic management (ingress, load balancing, canary deployments)
- Your team has or is building Kubernetes expertise
- You need consistent behavior across development, staging, and production environments
- You want infrastructure-as-code with declarative configuration
Probably not the right choice when:
- You have a small number of services (fewer than 5) with simple workloads
- Your team lacks DevOps capacity to manage cluster operations
- Your workload is primarily stateless but simple (consider managed container services instead)
- You need to move fast with minimal infrastructure overhead
- Cost of running a cluster outweighs benefits for your use case
Trade-off Table
| Factor | With Kubernetes | Without Kubernetes |
|---|---|---|
| Latency | Baseline (no added hops) | Baseline |
| Consistency | Declarative config, same behavior everywhere | Configuration varies by environment |
| Cost | Control plane + worker nodes overhead | Lower (fewer components) |
| Complexity | Steeper learning curve; cluster management | Simpler direct deployment |
| Operability | Centralized cluster management; unified monitoring | Per-server management |
| Scalability | Auto-scaling built-in; handles thousands of pods | Manual scaling |
| Reliability | Self-healing, automatic restarts, load balancing | Requires external tools |
| Flexibility | Runs anywhere (cloud, on-prem, hybrid) | Tied to specific infrastructure |
Production Failure Scenarios
| Failure | Impact | Mitigation |
|---|---|---|
| Node goes down | Pods on that node become unavailable | Run multiple replicas; use PodDisruptionBudgets; cluster autoscaler provisions replacement nodes |
| Pod OOM killed (memory limit exceeded) | Application crashes and restarts | Set appropriate memory requests/limits; monitor memory usage; investigate memory leaks |
| Pod throttled (CPU limit exceeded) | Application latency increases | Set appropriate CPU requests/limits; profile application CPU usage |
| Image pull failure | Pods stuck in ImagePullBackOff state | Use private registries with credentials configured; cache images on nodes; use image pull secrets |
| Volume mount failure | Pod cannot start or crashes | Validate PersistentVolumeClaims; check storage class availability; monitor volume capacity |
| Deployment rollback required | Bad deployment causes failures | Use RollingUpdate strategy with maxSurge/maxUnavailable; test rollbacks in staging; keep previous working ReplicaSet |
| Namespace deletion accident | All resources in namespace deleted | Use ResourceQuota to limit scope; implement namespace protection; regular backups of cluster state |
| etcd data loss | Cluster state corrupted; may require full rebuild | Use etcd backups; run etcd in HA mode; monitor disk I/O on etcd nodes |
Observability Checklist
Metrics
- Node CPU and memory utilization
- Pod CPU and memory requests vs actual usage
- Pod restart count and reason
- Deployment rollout progress
- HPA status (current replicas vs desired)
- Persistent volume capacity and usage
- Network policies in effect
- API server request latency and error rate
Logs
- Container logs captured (stdout/stderr)
- Kubernetes events logged (pod scheduling, volume mounts, image pulls)
- Node-level logs for kubelet and container runtime
- Audit logs for API server access (who did what when)
- Include labels and selectors in log context for filtering
Alerts
- Alert when node memory/CPU exceeds 85% utilization
- Alert when pod restart count exceeds threshold in short window
- Alert when deployment fails to make progress
- Alert when HPA is at max replicas (may need to adjust)
- Alert on persistent volume capacity approaching limit
- Alert when etcd disk I/O is high (could indicate problems)
- Alert on unauthorized API server access attempts
Security Checklist
- RBAC configured with least-privilege principle (avoid cluster-admin where possible)
- Service accounts use bound service account tokens; avoid default tokens
- NetworkPolicy restricts traffic between namespaces (default deny)
- PodSecurityPolicy or Pod Security Standards enforced
- Secrets not stored in etcd in plain text (use encryption at rest)
- Container images scanned for vulnerabilities (use image policy)
- RunAsNonRoot and readOnlyRootFilesystem enforced where possible
- Kubernetes API server not exposed publicly; use authentication
- Regular Kubernetes version updates for security patches
Common Pitfalls / Anti-Patterns
Misconfigured resource requests and limits: Setting requests too low causes throttling; setting limits too aggressively causes OOM kills. Profile your application under realistic load and set appropriate values with some headroom.
Ignoring pod disruption budgets: Without PDBs, a node drain can take down too many replicas simultaneously. Always set PDBs for stateful or high-availability workloads.
Using latest tag for images: latest means different things at different times. Always pin image tags to specific versions to ensure reproducible deployments.
Not planning for capacity: Running nodes at high utilization leaves no headroom for spikes. Cluster autoscaler helps but is not instantaneous. Plan for ~70% average utilization.
Overly permissive RBAC: Giving developers cluster-admin “because it is easier” creates security and stability risks. Use namespace-scoped roles and RoleBindings.
Not using labels and selectors consistently: Labels are the primary way to select pods for services, deployments, and network policies. Inconsistent labeling breaks routing and isolation.
Ignoring Kubernetes events: Events are often the first indicator of problems (ImagePullBackOff, FailedScheduling, etc.). Monitor and alert on events, not just metrics.
Operational Runbook: Common Debugging Tasks
When a pod misbehaves, the debugging approach depends on the symptom. Here are the three most common issues and how to diagnose them.
CrashLoopBackOff
The container starts, crashes, Kubernetes restarts it, crashes again — a loop.
# Check pod status and recent events
kubectl get pod myapp-xxx -n production
kubectl describe pod myapp-xxx -n production
# View container logs (all restarts)
kubectl logs myapp-xxx -n production --previous
# Check exit code
# Exit code 1: application error — check logs
# Exit code 137: OOMKilled — increase memory limit
# Exit code 143: graceful termination — app received SIGTERM
Common causes: application bug on startup, missing environment variable, failed health check, out-of-memory. Fix the root cause before deploying again.
ImagePullBackOff
Kubernetes cannot pull the container image and is backing off before retrying.
# Check image pull status
kubectl describe pod myapp-xxx -n production | grep -A5 "Events"
# Common error messages:
# ErrImageNotFound — tag or digest does not exist in registry
# ErrImagePull — authentication failed, network issue, or image not found
# registry does not exist — wrong registry URL
Fixes: verify the image tag exists in the registry, check image pull secrets, ensure the registry is reachable from the cluster, validate the full image URL in the pod spec.
Pending Pods
A pod is scheduled but not running — stuck waiting for resources or conditions.
# Check why pod is not running
kubectl describe pod myapp-xxx -n production | grep -A10 "Events"
# Check node resource availability
kubectl describe nodes | grep -A5 "Allocated resources"
kubectl top nodes
# Check if PVC is stuck pending
kubectl get pvc -n production
kubectl describe pvc my-volume-claim -n production
# Check for taints blocking scheduling
kubectl get nodes -o json | jq '.items[].spec.taints'
Pending pods usually mean: insufficient CPU/memory requests, storage claim not bound, node selector/affinity not matching, or taints preventing scheduling. Match the pod spec to your cluster’s available resources and taints.
Quick Debug Checklist
# 1. Get current pod state
kubectl get pod myapp-xxx -n production -o wide
# 2. Check events on the pod
kubectl describe pod myapp-xxx -n production
# 3. View application logs
kubectl logs myapp-xxx -n production --tail=100
# 4. Check if the container is crashing
kubectl logs myapp-xxx -n production --previous
# 5. Check resource usage on the node
kubectl top pod myapp-xxx -n production
kubectl describe node $(kubectl get pod myapp-xxx -n production -o jsonpath='{.spec.nodeName}')
# 6. Check for finalizer issues (stuck in Terminating)
kubectl get pod myapp-xxx -n production -o jsonpath='{.metadata.finalizers}'
Quick Recap
graph TD
subgraph Cluster
subgraph Node
P1[Pod] --> S1[Service]
P2[Pod] --> S1
end
K8s[Control Plane] --> Node
end
External[External Traffic] --> Ingress
Ingress --> S1
Key Points
- Pods are the smallest deployable unit; usually one container per pod
- Services provide stable networking for ephemeral pods
- Deployments manage rolling updates and maintain replica count
- Ingress handles external HTTP/HTTPS traffic routing
- Helm simplifies managing manifests across environments
- Resource requests and limits prevent resource contention
Production Checklist
# Kubernetes Production Readiness
- [ ] Resource requests and limits set for all pods
- [ ] PodDisruptionBudgets configured for HA workloads
- [ ] Multiple replicas for stateless services
- [ ] Image tags pinned to specific versions
- [ ] RBAC configured with least privilege
- [ ] NetworkPolicy with default deny enforced
- [ ] PodSecurityStandards or PodSecurityPolicy enforced
- [ ] etcd backed up regularly
- [ ] Cluster autoscaler configured
- [ ] HPA configured for scalable workloads
- [ ] Monitoring and alerting for node and pod metrics
- [ ] Audit logging enabled
- [ ] Kubernetes version kept up to date
Interview Questions
Q: A pod is stuck in Pending state. How do you diagnose it?
A: Check kubectl describe pod for events. Common causes: insufficient cluster resources (CPU/memory), no matching node selectors or affinity rules, PVC not bound, or image pull failures. Check node capacity with kubectl describe nodes, PVC status with kubectl get pvc, and events with kubectl get events --sort-by='.lastTimestamp'.
Q: Your Service cannot reach pods. Walk through the debugging steps.
A: Start at the Service level: verify the selector matches pod labels (kubectl get svc -o wide, kubectl get pods --show-labels). Check endpoints exist (kubectl get endpoints <svc>). If no endpoints, the selector mismatch is the culprit. If endpoints exist, check whether pods are actually running and listening on the target port. Use kubectl exec into a pod and curl the target port directly. Check network policies that might be blocking traffic.
Q: How do you upgrade a Kubernetes cluster with zero downtime?
A: For managed clusters (GKE, EKS, AKS), use the managed upgrade path which handles node draining and replacement automatically. For self-managed: use kubectl drain --ignore-daemonsets --delete-emptydir-data to safely evict pods, then upgrade the node. For StatefulSets, use PodDisruptionBudgets to ensure minimum availability during upgrades. Always test in a non-production environment first. Use blue-green node pool strategies where you spin up new nodes, migrate workloads, then terminate old nodes.
Q: What is the difference between a Deployment and a StatefulSet? When would you use each? A: Deployments manage stateless applications with interchangeable pods — Kubernetes freely schedules, scales, and replaces pods. StatefulSets manage stateful applications requiring stable identity, stable storage, and ordered deployment/scaling — pods have persistent identifiers and ordered graceful deployment. Use Deployments for web servers, APIs, and most application workloads. Use StatefulSets for databases, Kafka, ZooKeeper, and any workload that needs persistent identity.
Q: A container is OOMKilled but the pod’s memory limit seems high enough. What could be happening?
A: The container’s memory usage likely exceeded the limit within the cgroup. Check actual usage with kubectl top pod or docker stats. The limit shown in the pod spec might be a request rather than a limit, or the application might have a memory leak. Check if the container is hitting the node-level memory pressure with kubectl describe node. Consider whether the application is forking processes that escape the container’s cgroup accounting. Also verify the OOMKilled reason in kubectl describe pod — there is a difference between container-level OOM and node-level OOM.
Q: How do you handle secrets in Kubernetes securely? A: Never store secrets in etcd as plaintext — enable encryption at rest. Use a secrets management tool: HashiCorp Vault with ESO (External Secrets Operator) syncs secrets from Vault into Kubernetes secrets, or AWS Secrets Manager with the Secrets Store CSI driver. Avoid mounting secrets as environment variables when possible — env vars persist in process memory and appear in logs more easily. Use short-lived tokens and workload identity where possible. Audit access to secrets with Kubernetes audit logging.
Conclusion
Kubernetes handles the hard parts of running containers at scale: scheduling, healing, scaling, and networking. You describe your desired state; Kubernetes makes it happen.
The concepts are straightforward: pods wrap containers, services provide stable networking, deployments manage updates, and ingress handles external traffic. Helm simplifies managing manifests across environments.
The operational burden is real. Kubernetes clusters require attention: upgrades, monitoring, resource management, and access control. For small teams or simple workloads, the complexity may not pay off. For production microservices at scale, Kubernetes is usually worth the investment.
Category
Related Posts
Advanced Kubernetes: Controllers, Operators, RBAC, Production Patterns
Explore Kubernetes custom controllers, operators, RBAC, network policies, storage classes, and advanced patterns for production cluster management.
Helm Charts: Templating, Values, and Package Management
Learn Helm Charts from basics to advanced patterns. Master Helm templates, values management, chart repositories, and production deployment workflows.
Docker Fundamentals
Learn Docker containerization fundamentals: images, containers, volumes, networking, and best practices for building and deploying applications.