Advanced Kubernetes: Controllers, Operators, RBAC, Production Patterns
Explore Kubernetes custom controllers, operators, RBAC, network policies, storage classes, and advanced patterns for production cluster management.
Advanced Kubernetes: Controllers, Operators, RBAC, and Production Patterns
Kubernetes has become the de facto standard for container orchestration. If you have been running clusters for a while, you have likely encountered scenarios that basic Kubernetes resources do not handle well. This is where custom controllers, operators, and advanced security patterns become essential.
This guide assumes you already know Kubernetes basics. If you are just starting, our Docker Fundamentals guide covers containers first, which is essential groundwork before tackling Kubernetes.
The Control Plane Architecture
Before diving into advanced topics, let us review how Kubernetes control plane components work together.
graph TB
subgraph "Control Plane"
A[API Server] --> B[etcd]
A --> C[Controller Manager]
A --> D[Scheduler]
C --> E[Controllers]
E --> A
end
subgraph "Worker Nodes"
F[Kubelet] --> A
G[Container Runtime] --> F
H[Kube Proxy] --> F
end
The API server is the gateway to everything. All cluster operations go through it, and it validates configurations before persisting to etcd. Controllers watch the API server for changes and reconcile actual state toward desired state.
Custom Resource Definitions
CRDs extend the Kubernetes API to define new resource types. They let you create domain-specific objects that Kubernetes can manage like native resources.
Defining a CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
names:
kind: Database
plural: databases
shortNames:
- db
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: [postgresql, mysql, mongodb]
version:
type: string
replicas:
type: integer
minimum: 1
storage:
type: object
properties:
size:
type: string
storageClass:
type: string
status:
type: object
properties:
phase:
type: string
endpoint:
type: string
After applying this CRD, you can create Database objects just like built-in resources:
kubectl apply -f database-crd.yaml
kubectl get databases
CRD Versioning
Kubernetes supports multiple versions of a CRD simultaneously. The storage flag indicates which version persists to etcd. This enables zero-downtime migrations when you need to change your schema.
versions:
- name: v1
served: true
storage: true
- name: v1beta1
served: true
storage: false
Clients can request specific versions via the Accept header. This gives you flexibility during rolling upgrades.
Custom Controllers
Controllers are control loops that watch resources and take action to achieve desired state. The Kubernetes ecosystem is built on this pattern, and you can extend it with custom controllers.
The Controller Pattern
A controller follows a reconcile loop:
- Watch for changes to resources
- Fetch current state
- Compare current state with desired state
- Take action to reconcile the difference
- Update status
- Repeat
Writing a Basic Controller
The controller-runtime library simplifies controller development:
package controller
import (
"context"
"fmt"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/manager"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"sigs.k8s.io/controller-runtime/pkg/source"
)
type DatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
}
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
log := fmt.Sprintf("Reconciling Database %s/%s", req.Namespace, req.Name)
// Fetch the Database instance
db := &examplev1.Database{}
err := r.Get(ctx, req.NamespacedName, db)
if err != nil {
return reconcile.Result{}, client.IgnoreNotFound(err)
}
// Create or update the StatefulSet
statefulSet := r.buildStatefulSet(db)
err = r.createOrUpdate(ctx, statefulSet)
if err != nil {
return reconcile.Result{}, err
}
// Update status
db.Status.Phase = "Running"
db.Status.Endpoint = fmt.Sprintf("%s.%s.svc.cluster.local", db.Name, db.Namespace)
r.Status().Update(ctx, db)
return reconcile.Result{Requeue: true}, nil
}
Controllers run as part of a manager that handles caching, client connections, and leader election. This makes them robust in production environments with multiple replicas.
Operators: Domain-Specific Automation
Operators are custom controllers with domain-specific knowledge baked in. They encode operational expertise into software that handles complex, stateful applications.
The key difference from generic controllers is that operators understand the application they manage. They know how to handle backups, upgrades, failover, and other operational tasks.
Building an Operator with Operator SDK
Operator SDK provides scaffolding and best practices for building operators:
# Install operator-sdk
brew install operator-sdk
# Create a new operator
operator-sdk init --domain example.com --repo github.com/example/database-operator
# Create the API and controller
operator-sdk create api --group database --version v1 --kind Database --resource --controller
Defining the Operator API
# api/v1/database_types.go
package v1
type DatabaseSpec struct {
Engine string `json:"engine,omitempty"`
Version string `json:"version,omitempty"`
Replicas int32 `json:"replicas,omitempty"`
Storage StorageSpec `json:"storage,omitempty"`
BackupConfig *BackupConfigSpec `json:"backupConfig,omitempty"`
}
type StorageSpec struct {
Size string `json:"size"`
StorageClass string `json:"storageClass,omitempty"`
}
type BackupConfigSpec struct {
Schedule string `json:"schedule"`
Bucket string `json:"bucket"`
}
type DatabaseStatus struct {
Phase string `json:"phase,omitempty"`
Endpoint string `json:"endpoint,omitempty"`
}
Implementing Reconcile Logic
func (r *DatabaseReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
log := r.Log.WithValues("database", req.NamespacedName)
// Fetch the Database instance
db := &databasev1.Database{}
err := r.Get(ctx, req.NamespacedName, db)
if err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// Create or update StatefulSet
statefulSet, err := r.desiredStatefulSet(db)
if err != nil {
return ctrl.Result{}, err
}
err = r.createOrUpdate(ctx, statefulSet)
if err != nil {
return ctrl.Result{}, err
}
// Handle backups if configured
if db.Spec.BackupConfig != nil {
result, err := r.reconcileBackups(ctx, db)
if err != nil {
return result, err
}
}
// Update status
db.Status.Phase = "Running"
db.Status.Endpoint = fmt.Sprintf("%s.%s.svc.cluster.local", db.Name, db.Namespace)
r.Status().Update(ctx, db)
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
Practical Operator Examples
Operators shine for managing stateful applications:
- Prometheus Operator manages Prometheus deployments and monitoring configurations
- Velero Operator handles backup and restore of Kubernetes resources and volumes
- Cert Manager automates certificate management with Let’s Encrypt
When building your own operator, ask yourself whether the application has complex lifecycle requirements that generic Kubernetes resources cannot handle.
Role-Based Access Control
RBAC restricts who can perform operations in the cluster. It uses four key concepts: subjects (who), verbs (what actions), resources (what objects), and namespaces (where).
Roles and RoleBindings
Role and RoleBinding are namespace-scoped:
# Role definition
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: deployment-manager
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "update", "patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]
# RoleBinding - grants Role to subjects
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: deployment-manager-binding
namespace: production
subjects:
- kind: User
name: alice@example.com
apiGroup: rbac.authorization.k8s.io
- kind: Group
name: developers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: deployment-manager
apiGroup: rbac.authorization.k8s.io
ClusterRoles and ClusterRoleBindings
ClusterRoles and ClusterRoleBindings work cluster-wide:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: node-viewer
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: node-viewer-binding
subjects:
- kind: ServiceAccount
name: metrics-collector
namespace: monitoring
roleRef:
kind: ClusterRole
name: node-viewer
apiGroup: rbac.authorization.k8s.io
ServiceAccount Usage
Pods use ServiceAccounts to authenticate to the API server:
apiVersion: v1
kind: ServiceAccount
metadata:
name: my-app-sa
namespace: production
---
apiVersion: v1
kind: Pod
metadata:
name: my-app
namespace: production
spec:
serviceAccountName: my-app-sa
containers:
- name: app
image: my-app:latest
Your application retrieves the token mounted at /var/run/secrets/kubernetes.io/serviceaccount/ and uses it to authenticate API calls.
Network Policies
Network policies restrict traffic between pods. By default, all pods can reach all other pods and services in a cluster. Network policies let you implement defense in depth.
Basic Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-isolation
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 53
- protocol: UDP
port: 53
This policy restricts the API pod to receive traffic only from frontend pods and monitoring namespaces, and allows it to send traffic only to the database and DNS.
DNS Egress
Almost every pod needs DNS resolution. Make sure your egress policies include port 53 to both TCP and UDP, or your applications will fail to resolve service names.
Storage Classes and Persistent Volumes
Dynamic provisioning of persistent storage requires StorageClasses. They define how storage is provisioned when a PersistentVolumeClaim requests it.
Defining a StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-storage
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
replication-type: regional-pd
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
The WaitForFirstConsumer binding mode delays volume binding until a pod actually uses the claim. This allows the scheduler to co-locate volumes with pods in the same zone.
Using PersistentVolumes in Pods
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: database-storage
namespace: production
spec:
accessModes:
- ReadWriteOnce
storageClassName: fast-storage
resources:
requests:
storage: 100Gi
---
apiVersion: v1
kind: Pod
metadata:
name: database
namespace: production
spec:
containers:
- name: db
image: postgres:15-alpine
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumes:
- name: data
persistentVolumeClaim:
claimName: database-storage
Resource Quotas and Limits
Namespaces let you partition the cluster, but ResourceQuotas enforce resource limits within namespaces.
Setting Quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 100Gi
limits.cpu: "40"
limits.memory: 200Gi
pods: "50"
services: "10"
persistentvolumeclaims: "20"
apiVersion: v1
kind: LimitRange
metadata:
name: production-limits
namespace: production
spec:
limits:
- max:
cpu: "8"
memory: 32Gi
min:
cpu: 100m
memory: 128Mi
default:
cpu: 500m
memory: 1Gi
defaultRequest:
cpu: 200m
memory: 256Mi
type: Container
The LimitRange sets default requests and limits for containers that do not specify them, while ResourceQuota caps total resource usage per namespace.
Pod Disruption Budgets
When performing cluster maintenance, PDBs ensure minimum availability for your applications.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: api
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: frontend-pdb
namespace: production
spec:
maxUnavailable: 25%
selector:
matchLabels:
app: frontend
The first PDB ensures at least 2 API pods are running during disruptions. The second allows up to 25% of frontend pods to be unavailable simultaneously.
Advanced Scheduling
Node Affinity and Anti-Affinity
apiVersion: v1
kind: Pod
metadata:
name: database
namespace: production
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: disktype
operator: In
values:
- ssd
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- database
topologyKey: topology.kubernetes.io/zone
This schedules database pods on nodes with SSD storage and tries to spread them across availability zones.
Taints and Tolerations
Taints repel pods from nodes unless the pods have matching tolerations:
# Taint a node to repel non-critical workloads
kubectl taint nodes node1 dedicated=ml-workloads:NoSchedule
# Pod that tolerates the taint
apiVersion: v1
kind: Pod
metadata:
name: ml-job
spec:
tolerations:
- key: dedicated
operator: Equal
value: ml-workloads
effect: NoSchedule
containers:
- name: ml
image: ml-training:latest
This pattern is useful for reserving nodes for specific workloads like ML training or stateful services.
When to Use / When Not to Use
Understanding when these advanced patterns apply helps you avoid over-engineering.
Custom Controllers and Operators
Use when:
- You manage stateful applications with complex lifecycle requirements
- You need to encode domain-specific operational knowledge into automated workflows
- You want to reduce manual intervention for recurring operational tasks
- You are building a platform that other teams will consume
When not to use:
- Your application is stateless and scales horizontally without special handling
- You only need basic Kubernetes primitives like Deployments and Services
- The operational complexity of building an operator exceeds the manual effort it would save
- You are in early stages and requirements are still changing rapidly
Decision Tree: Controllers vs Operators vs Native Resources
Use this flowchart to determine which approach fits your use case:
flowchart TD
A[What are you trying to manage?] --> B{Is it a built-in K8s resource?}
B -->|Yes| C[Use native resource<br/>Deployment, StatefulSet, Service, etc.]
B -->|No| D{Does the app have complex lifecycle?}
D -->|No - stateless, simple scale| E[Use native resources + Helm/Kustomize]
D -->|Yes - backups, upgrades, failover| F{Is it a well-known off-the-shelf app?}
F -->|Yes - Prometheus, CertManager, Velero| G[Install existing Operator<br/>via Helm or OperatorHub]
F -->|No - custom domain app| H{Can existing controllers handle it?}
H -->|Yes - CRD + standard reconciliation| I[Write Custom Controller<br/>with controller-runtime]
H -->|No - app-specific domain logic| J[Build an Operator<br/>with Operator SDK]
I --> K[Does it need Helm-style packaging?]
J --> K
K -->|Yes| L[Package as Operator with OLM]
K -->|No| M[Deploy controller directly<br/>via YAML]
Quick reference:
| Approach | Complexity | Best For |
|---|---|---|
| Native resources | Lowest | Deployments, Services, ConfigMaps, vanilla stateful apps |
| Helm/Kustomize | Low | Package and configure standard apps, no custom logic |
| Custom Controller | Medium | CRDs with standard reconcile loops, no app-specific domain |
| Existing Operator | Low-Medium | Prometheus, cert-manager, Velero, databases, message queues |
| Custom Operator | Highest | Complex domain logic, specialized stateful apps, internal platforms |
RBAC and Network Policies
Use when:
- Multiple teams share the same cluster
- You need to enforce least-privilege access
- Security compliance requires network segmentation
- You want defense in depth beyond pod-level security
When not to use:
- Single-tenant clusters with trusted users
- Development or test environments without sensitive workloads
- Network policies are handled by a higher-level service mesh
Storage Classes and Persistent Volumes Details
Use when:
- Stateful workloads require persistent storage
- You need dynamic provisioning based on application needs
- You want to separate storage tiers (SSD vs HDD)
When not to use:
- Stateless applications that store no persistent data
- Caches or temporary data that can be lost without consequences
Production Failure Scenarios
Understanding real failure modes helps you prepare better.
| Failure | Impact | Mitigation |
|---|---|---|
| etcd quorum loss | Cluster becomes read-only or unavailable | Maintain at least 3 etcd nodes, regular backups, separate etcd disks |
| API server overload | All cluster operations fail | Implement proper rate limiting, optimize client code, scale API server |
| Kubelet failure | All pods on node become unhealthy | Use pod disruption budgets, set pod priority classes, monitor node health |
| Storage class deletion with active PVCs | Pods cannot start, data loss potential | Use allowVolumeExpansion: false initially, never delete active storage classes |
| RBAC misconfiguration | Users cannot perform needed operations | Use kubectl auth can-i for verification, audit role bindings regularly |
| Network policy misconfiguration | Application pods cannot communicate | Test in staging first, use policy order carefully, always allow DNS egress |
| Controller reconciliation loops | High API server load, degraded cluster performance | Implement proper reconciliation with exponential backoff |
| Pod budget too restrictive | Cluster upgrades blocked | Set realistic minAvailable values, test disruption scenarios |
Observability Checklist
Comprehensive monitoring helps you catch issues before they become outages.
Metrics to Collect
graph LR
A[Control Plane Metrics] --> B[API Server]
A --> C[etcd]
A --> D[Controller Manager]
A --> E[Scheduler]
F[Node Metrics] --> G[Kubelet]
F --> H[Container Runtime]
F --> I[Kube Proxy]
J[Workload Metrics] --> K[Pod CPU Memory]
J --> L[Deployment Replicas]
J --> M[PV Usage]
Control plane metrics:
- API server request latency and error rates
- etcd disk I/O and WAL fsync latency
- Controller reconciliation duration and error counts
- Scheduler pod placement latency
Prometheus queries for control plane health:
# API server request error rate (5xx errors)
sum(rate(apiserver_request_total{job="apiserver",code=~"5.."}[5m])) / sum(rate(apiserver_request_total{job="apiserver"}[5m]))
# etcd WAL fsync latency (p99)
histogram_quantile(0.99, rate(etcd_disk_wal_fsync_duration_seconds_bucket{job="etcd"}[5m]))
# Controller reconciliation duration (p99)
histogram_quantile(0.99, rate(workqueue_work_duration_seconds_bucket{job="kube-controller-manager"}[5m]))
# Scheduler pod placement latency (p99)
histogram_quantile(0.99, rate(scheduler_pod_scheduling_duration_seconds_bucket{job="kube-scheduler"}[5m]))
# API server request latency by verb (p99)
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{job="apiserver"}[5m])) by (verb)
# etcd compaction and backup duration
rate(etcd_debugging_compaction_duration_seconds_sum{job="etcd"}[5m])
# Leader election rate (high rate means instability)
rate(etcd_server_leader_changes_seen_total{job="etcd"}[5m])
Node metrics:
- Kubelet working set and eviction thresholds
- Container runtime CPU and memory usage
- Network bytes sent/received per pod
Application metrics:
- Pod CPU and memory actual usage vs requests
- Deployment replica count vs desired
- Persistent volume usage percentage
- Custom resource status conditions
Logs to Capture
graph TD
A[Application Logs] --> D[Stdout stderr]
B[System Logs] --> E[Kubelet]
B --> F[Container Runtime]
C[Kubernetes Logs] --> G[API Server]
C --> H[Controller Manager]
C --> I[Scheduler]
J[Audit Logs] --> K[API Requests by User]
J --> L[Policy Violations]
- Aggregate all logs to a central location (Loki, ELK, CloudWatch)
- Include Kubernetes metadata: namespace, pod name, container name
- Capture Kubernetes events for resource lifecycle changes
- Store audit logs for compliance and security investigations
Alerts to Configure
Critical (immediate response required):
- API server unavailable for more than 1 minute
- etcd high latency or leadership elections
- Node not ready for more than 2 minutes
- Pod evictions occurring due to resource pressure
Warning (investigate soon):
- Pod restart loop (CrashLoopBackOff)
- Deployment replica count below desired
- Persistent volume usage above 80%
- Certificate expiration within 30 days
Security Checklist
RBAC Security
- Review all ClusterRoleBindings and RoleBindings quarterly
- Use ServiceAccounts instead of user credentials for workloads
- Implement least-privilege: only grant required permissions
- Use
kubectl auth can-i --listto audit effective permissions - Rotate ServiceAccount tokens regularly
Network Security
- Apply default deny NetworkPolicies in each namespace
- Explicitly allow only required traffic paths
- Always include DNS egress in network policies
- Use Kubernetes DNS for service discovery (not hardcoded IPs)
- Consider service mesh for mTLS between services
Pod Security
graph LR
A[Pod Security] --> B[Run as non-root]
A --> C[ReadOnly root filesystem]
A --> D[Drop all capabilities]
A --> E[No privileged containers]
A --> F[Resource limits set]
- Run containers as non-root user (securityContext.runAsNonRoot: true)
- Use read-only root filesystem when possible (securityContext.readOnlyRootFilesystem: true)
- Drop all capabilities and add only required ones (securityContext.capabilities.drop)
- Set resource requests and limits to prevent resource starvation
- Disable host PID and network namespaces (securityContext.hostPID: false, hostNetwork: false)
- Use PodSecurityStandards or OPA Gatekeeper for policy enforcement
Secret Management
- Never store secrets in etcd or configmaps in plain text
- Use external secrets solutions (External Secrets Operator, HashiCorp Vault)
- Enable encryption at rest for etcd
- Rotate secrets regularly and have a revocation plan
Common Pitfalls / Anti-Patterns
Controller Pitfalls
Reconciliation without backoff Writing a controller that continuously reconciles without exponential backoff will overwhelm the API server and cause cascading failures. Always implement retry logic with increasing delays.
Ignoring status updates Controllers that fail to update status leave users blind about their resource state. Status conditions should reflect actual observed state.
Not handling deletion Controllers must watch for deletion events and clean up resources properly. Orphaned resources cause ghost deployments and confusion.
RBAC Pitfalls
Using default service accounts Workloads should always use dedicated ServiceAccounts with specific permissions. Default ServiceAccount has too many implicit permissions.
Granting cluster-admin broadly Reserve cluster-admin for break-glass scenarios. Use namespace-scoped roles for daily operations.
Forgetting to audit RBAC configurations drift over time. Regular audits catch permission creep.
Network Policy Pitfalls
Forgetting DNS DNS uses port 53 on both TCP and UDP. Without DNS egress, applications cannot resolve service names and all external calls fail.
Too permissive policies
Using podSelector: {} matches all pods in the namespace. Be specific about source and destination pods.
Policy ordering NetworkPolicy rules are evaluated in order. Place restrictive rules first to avoid accidentally allowing traffic.
Storage Pitfalls
Deleting StorageClass accidentally Never delete a StorageClass that has active claims. The deletion does not block, but dependent pods cannot recover.
Not monitoring volume quotas Running out of PV capacity blocks new PVC claims. Monitor available capacity and plan expansion.
Using ReadWriteMany incorrectly Not all volume plugins support ReadWriteMany. Using it with unsupported backends causes mount failures.
Quick Recap
Key Takeaways
- Custom controllers and operators encode operational expertise but add complexity only justified for stateful, domain-specific applications
- RBAC follows the principle of least privilege: grant only the permissions actually needed
- Network policies implement defense in depth; always include DNS egress rules
- StorageClasses enable dynamic provisioning but require careful capacity planning
- Pod disruption budgets protect availability during voluntary disruptions
- Taints and tolerations control pod placement across node types
- Observability across metrics, logs, and alerts is essential for production reliability
Production Readiness Checklist
# RBAC
kubectl get rolebindings,clusterrolebindings -A | grep -v system:
kubectl auth can-i --list --as=system:serviceaccount:production:my-app-sa
# Network Policies
kubectl get networkpolicies -A
kubectl describe networkpolicy <name> -n production
# Storage
kubectl get pvc -A | grep -v Bound
kubectl get storageclass
# Pod Security
kubectl get pods -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.securityContext.runAsNonRoot}{"\n"}{end}' -n production
# Controllers
kubectl get events --sort-by='.lastTimestamp' -n production | tail -50
Trade-off Summary
| Pattern | Best For | Complexity | Operational Burden |
|---|---|---|---|
| Built-in controllers | Standard workloads | Low | Minimal |
| Custom controllers | Domain-specific automation | Medium | Medium |
| Operators (kubebuilder) | Complex lifecycle management | High | High |
| Operators (operator-sdk) | Existing Go projects | High | High |
| RBAC only | Simple permissions | Low | Minimal |
| OPA Gatekeeper | Policy enforcement | Medium | Medium |
| Kyverno | Policy as YAML | Low | Low |
Conclusion
Advanced Kubernetes topics build on the fundamentals of containers and orchestration. Custom controllers and operators let you encode domain knowledge into automated workflows. RBAC and network policies enforce security boundaries. Storage classes and resource quotas ensure predictable cluster operation.
These patterns emerge from real production experience. Start with the basics of your applications, understand the failure modes, and apply the patterns that solve your specific problems.
For packaging your Kubernetes applications, the Helm Charts guide covers templating and release management. If you are building observability into your cluster, check out our Distributed Tracing and Prometheus & Grafana guides for monitoring setup.
Category
Related Posts
Kubernetes: Container Orchestration for Microservices
Learn Kubernetes fundamentals: pods, services, deployments, ingress controllers, Helm charts, autoscaling, and microservices architecture patterns.
Helm Charts: Templating, Values, and Package Management
Learn Helm Charts from basics to advanced patterns. Master Helm templates, values management, chart repositories, and production deployment workflows.
Docker Fundamentals
Learn Docker containerization fundamentals: images, containers, volumes, networking, and best practices for building and deploying applications.