Building Custom Kubernetes Controllers and Operators
Extend Kubernetes with custom controllers and operators to automate management of complex stateful applications beyond built-in workload types.
Building Custom Kubernetes Controllers and Operators
Kubernetes ships with built-in controllers for Deployments, StatefulSets, DaemonSets, and Jobs. These cover many use cases, but sometimes you need custom behavior. Maybe you want to manage a database that requires coordinated initialization, or an external service that needs lifecycle management. Custom controllers and operators let you extend Kubernetes to handle these scenarios.
This post explains the controller pattern, Custom Resource Definitions (CRDs), and how to build operators using controller-runtime.
If you are new to Kubernetes, start with the Kubernetes fundamentals post. For core workload types, see the Kubernetes Workload Resources post.
When to Use / When Not to Use
Custom controllers solve real problems, but they add operational overhead. Here is when they make sense and when they do not.
When to reach for custom controllers
Reach for a custom controller when you need to manage something Kubernetes cannot natively track. External databases that require coordinated initialization, certificate authorities that need renewal workflows, or multi-tenant platforms where each tenant needs isolated resources all fit this category. If you find yourself running kubectl exec into pods to manually handle state that should be automated, a controller can encode that operational knowledge.
Platform teams building internal developer platforms often need this. Rather than expecting every developer to know how to provision a database correctly, you give them a Database CRD and let the controller handle the details.
When to use operators specifically
An operator is a controller with domain knowledge baked in. Use one when your application has operational procedures that should be automated but currently live in runbooks or wikis. Database failover logic, schema migration pipelines, backup orchestration with retention policies these are all good candidates. The key question is whether your app has stateful operational knowledge worth encoding.
When to skip them
Do not reach for controllers just because they are interesting. A Deployment with proper configuration handles most stateless workloads fine. If you need something done once, a Job or CronJob is simpler and has less moving parts. And if your “automation” is really just running a Helm template with pre-determined values, a controller adds unnecessary complexity.
The operational burden is real. Controllers run somewhere, need monitoring, can have bugs, and require updates when Kubernetes APIs change. Make sure the benefit justifies this before adding a custom controller to your cluster.
Reconciliation Loop Flow
flowchart TD
A[Watch API Server<br/>for custom resource changes] --> B[Get current state<br/>of managed resources]
B --> C{Desired state<br/>== Current state?}
C -->|Yes| A
C -->|No| D[Reconcile:<br/>Create/Update/Delete]
D --> E[Update resource status<br/>and conditions]
E --> A
Controllers follow a declarative reconciliation loop: watch for changes, compare desired vs actual state, act to bring actual toward desired, update status, repeat.
Kubernetes Controller Pattern
A controller is a loop that watches the desired state and reconciles the actual state toward it. Kubernetes ships with many built-in controllers. The Deployment controller watches Deployments and creates ReplicaSets. The ReplicaSet controller creates Pods. The scheduler places Pods onto nodes. The kubelet on each node ensures containers are running.
The reconciliation loop follows this pattern:
watch(current_state) -> compare(desired, current) -> act(bring current to desired)
Controllers use the Kubernetes API to watch resources and create or modify other resources. If a Deployment requests 3 replicas, the controller ensures 3 pods exist. If a pod dies, the controller notices the difference and creates a replacement.
Custom Resources and CRDs
A Custom Resource Definition (CRD) extends the Kubernetes API with new resource types. Once you define a CRD, you can create instances of your custom resource just like built-in resources.
Defining a CRD
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databases.example.com
spec:
group: example.com
names:
kind: Database
plural: databases
shortNames:
- db
scope: Namespaced
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties:
engine:
type: string
enum: ["postgres", "mysql", "mongodb"]
version:
type: string
replicas:
type: integer
minimum: 1
maximum: 5
required:
- engine
- version
- replicas
status:
type: object
properties:
phase:
type: string
endpoint:
type: string
After applying this CRD, you can create Database resources:
apiVersion: example.com/v1
kind: Database
metadata:
name: my-postgres
spec:
engine: postgres
version: "15"
replicas: 3
Without a controller, these resources sit idle. You need a controller to watch Database resources and do something with them.
Controller-Runtime Library
controller-runtime is the standard library for building Kubernetes controllers. It handles API client setup, informer caching, and reconciliation loops.
Project setup
mkdir my-operator && cd my-operator
go mod init my-operator
go get sigs.k8s.io/controller-runtime@v0.17.0
Main entry point
package main
import (
"context"
"log"
"sigs.k8s.io/controller-runtime/pkg/controller"
"sigs.k8s.io/controller-runtime/pkg/event"
"sigs.k8s.io/controller-runtime/pkg/handler"
"sigs.k8s.io/controller-runtime/pkg/manager"
"sigs.k8s.io/controller-runtime/pkg/reconcile"
"sigs.k8s.io/controller-runtime/pkg/source"
)
func main() {
mgr, err := manager.New(cfg, manager.Options{})
if err != nil {
log.Fatal(err)
}
ctrl, err := controller.New("database-controller", mgr, controller.Options{
Reconciler: &ReconcileDatabase{},
})
if err != nil {
log.Fatal(err)
}
err = ctrl.Watch(
&source.Kind{Type: &examplev1.Database{}},
&handler.EnqueueRequestForObject{},
)
if err != nil {
log.Fatal(err)
}
log.Fatal(mgr.Start(context.Background()))
}
Reconciliation Loops and Idempotency
The reconciler compares desired state with actual state and takes action. Reconcilers must be idempotent: applying the same reconciliation multiple times produces the same result.
Reconcile implementation
type ReconcileDatabase struct {
client client.Client
scheme *runtime.Scheme
}
func (r *ReconcileDatabase) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
log.Printf("Reconciling Database %s/%s", req.Namespace, req.Name)
// Fetch the Database instance
db := &examplev1.Database{}
err := r.client.Get(ctx, req.NamespacedName, db)
if err != nil {
return reconcile.Result{}, client.IgnoreNotFound(err)
}
// Create or update backend resources based on spec
if db.Spec.Replicas > 0 {
result, err := r.ensureStatefulSet(ctx, db)
if err != nil {
return result, err
}
}
// Update status
db.Status.Phase = "Running"
db.Status.Endpoint = fmt.Sprintf("%s.%s.svc.cluster.local:5432", db.Name, db.Namespace)
err = r.client.Status().Update(ctx, db)
if err != nil {
return reconcile.Result{}, err
}
return reconcile.Result{}, nil
}
The reconciler fetches the Database resource, ensures a StatefulSet exists with the right spec, and updates the status. If the spec changes, the next reconciliation pass updates the StatefulSet.
Idempotency in practice
If the StatefulSet already exists with the correct spec, the reconcile loop does nothing. If it needs updating, the controller updates it. If it does not exist, the controller creates it. Running reconciliation 100 times produces the same result as running it once.
Operator Pattern for Stateful Apps
An operator extends the controller pattern with domain knowledge. It encodes operational procedures for managing a specific application. The operator pattern combines CRDs with custom controllers to handle application lifecycle events like backups, failover, and upgrades.
Database operator example
func (r *ReconcileDatabase) ensureStatefulSet(ctx context.Context, db *examplev1.Database) (reconcile.Result, error) {
ss := &appsv1.StatefulSet{}
err := r.client.Get(ctx, types.NamespacedName{
Name: db.Name,
Namespace: db.Namespace,
}, ss)
if apierrors.IsNotFound(err) {
// Create new StatefulSet
ss = r.buildStatefulSet(db)
err = r.client.Create(ctx, ss)
return reconcile.Result{}, err
}
if err != nil {
return reconcile.Result{}, err
}
// Update if spec changed
if !reflect.DeepEqual(ss.Spec.Replicas, &db.Spec.Replicas) {
ss.Spec.Replicas = &db.Spec.Replicas
err = r.client.Update(ctx, ss)
return reconcile.Result{}, err
}
return reconcile.Result{}, nil
}
func (r *ReconcileDatabase) buildStatefulSet(db *examplev1.Database) *appsv1.StatefulSet {
replicas := int32(db.Spec.Replicas)
return &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: db.Name,
Namespace: db.Namespace,
OwnerReferences: []metav1.OwnerReference{
*metav1.NewControllerRef(db, examplev1.GroupVersion.WithKind("Database")),
},
},
Spec: appsv1.StatefulSetSpec{
Replicas: &replicas,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app": db.Name,
},
},
ServiceName: db.Name,
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app": db.Name,
},
},
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "database",
Image: fmt.Sprintf("%s:%s", db.Spec.Engine, db.Spec.Version),
Ports: []corev1.ContainerPort{
{ContainerPort: 5432},
},
},
},
},
},
},
}
}
The operator uses OwnerReferences to link the StatefulSet to the Database resource. When the Database is deleted, Kubernetes garbage collects the StatefulSet automatically.
Client-Go Basics
client-go is the Go client library for Kubernetes. controller-runtime uses client-go under the hood, but you may need client-go directly for more control or for operators not using controller-runtime.
Direct client usage
import (
"k8s.io/client-go/kubernetes"
"k8s.io/client-go/tools/clientcmd"
)
func main() {
kubeconfig := os.Getenv("KUBECONFIG")
config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
if err != nil {
log.Fatal(err)
}
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
log.Fatal(err)
}
// List pods
pods, err := clientset.CoreV1().Pods("default").List(context.Background(), metav1.ListOptions{})
if err != nil {
log.Fatal(err)
}
for _, pod := range pods.Items {
fmt.Printf("Pod: %s\n", pod.Name)
}
}
client-go provides typed clients for all Kubernetes resource types. The clientset.CoreV1().Pods() returns an interface for pod operations.
Choosing Your Approach: Trade-off Comparison
| Approach | Control | Complexity | Best For |
|---|---|---|---|
| controller-runtime | Medium | Low-Medium | Most custom controllers, standard reconciliation |
| client-go directly | High | High | Very specific needs, learning K8s internals |
| Operator SDK | Medium | Medium | Production operators with OLM packaging |
| Kubebuilder | Medium | Medium | controller-runtime projects with best practices |
controller-runtime is the default choice for most controllers. It handles caching, watching, and retry logic that you would otherwise have to write yourself. client-go directly gives you more control but more boilerplate. Operator SDK adds scaffolding on top of controller-runtime for production operators.
Production Failure Scenarios
Reconciliation Loop Falling Behind
Under heavy cluster load, the controller can fall behind its watch on the API server. This means spec changes take longer to propagate to actual resources.
The tell is resources converging slowly after a change, sometimes several minutes. You might also see the controller pod consuming more CPU than usual.
Fix this by watching your controller queue depth. If it is growing, increase worker counts or add indexing for large clusters.
API Server Timeout During Reconciliation
Network blips or API server overload cause reconciliation operations to time out. The error message is context deadline exceeded.
This can leave resources partially updated. A pod might exist but not have its final labels set.
Use exponential backoff on retry. Do not just requeue immediately with the same delay.
Leader Election Failures
In HA setups with multiple controller replicas, leader election failures cause two controllers to think they are both in charge. Both then try to manage the same resources.
The symptom is duplicate resources or conflicting updates appearing in quick succession.
controller-runtime has built-in leader election. Make sure your lease duration is long enough for your typical restart time.
Anti-Patterns
Ignoring Deletion
If your reconcile only handles creates and updates, deleted custom resources leave their child resources behind. The controller never cleans up.
Add finalizers to your resources. In the reconcile delete handling, remove the finalizer last after cleaning up dependents.
Skipping Status Updates
Your users have no idea what the controller actually did. Did it create the StatefulSet? Is it still trying? Did it fail?
Update .status on every reconciliation pass. Use conditions to communicate transient states.
Missing Exponential Backoff
When the API server is overloaded and your controller keeps retrying immediately, you make the problem worse. Every failing controller adds load.
Set up exponential backoff with a reasonable initial interval (seconds, not milliseconds). Let the API server recover.
Security Checklist
RBAC for Controllers
Controllers need permissions to manage the resources they watch. Apply least privilege:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: database-controller-role
rules:
- apiGroups: ["example.com"]
resources: ["databases"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["services"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Never give a controller cluster-admin. If your controller only manages resources in one namespace, use a Role and RoleBinding instead of ClusterRole and ClusterRoleBinding.
Service Account for the Controller
Run your controller with a dedicated ServiceAccount:
apiVersion: v1
kind: ServiceAccount
metadata:
name: database-controller-sa
namespace: controllers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: database-controller-rolebinding
namespace: controllers
subjects:
- kind: ServiceAccount
name: database-controller-sa
namespace: controllers
roleRef:
kind: ClusterRole
name: database-controller-role
apiGroup: rbac.authorization.k8s.io
Mount the token inside the controller pod and configure kubeconfig to use it automatically via MountableSecrets.
General Security Practices
- Run controllers in a dedicated namespace, isolated from application workloads
- Do not let controllers manage their own deployment (circular dependency risk)
- Use
readOnlyRootFilesystem: truein the controller pod security context where possible - Enable RBAC audit logging to track controller permission usage
- Do not embed credentials in controller code — use ServiceAccounts or external secrets solutions
Testing Strategies
Unit Tests
Test reconciliation logic in isolation without a real API server:
func TestReconcileDatabaseCreatesStatefulSet(t *testing.T) {
// Setup fake client with Database CR
cl := fake.NewClientBuilder().
WithObjects(&examplev1.Database{
ObjectMeta: metav1.ObjectMeta{
Name: "test-db",
Namespace: "default",
},
Spec: examplev1.DatabaseSpec{
Engine: "postgres",
Replicas: 3,
},
}).Build()
r := &ReconcileDatabase{Client: cl}
req := reconcile.Request{
NamespacedName: types.NamespacedName{
Name: "test-db",
Namespace: "default",
},
}
_, err := r.Reconcile(context.Background(), req)
if err != nil {
t.Fatalf("reconcile failed: %v", err)
}
// Verify StatefulSet was created
ss := &appsv1.StatefulSet{}
err = cl.Get(context.Background(), req.NamespacedName, ss)
if err != nil {
t.Errorf("StatefulSet not created: %v", err)
}
}
Use sigs.k8s.io/controller-runtime/pkg/reconcile for the reconcile interface and sigs.k8s.io/controller-runtime/pkg/client/fake for the fake client.
Integration Tests
Test against a real API server using an envtest (like Kubebuilder’s test environment):
var testEnv *envtest.Environment
func TestMain(m *testing.M) {
testEnv = &envtest.Environment{}
_, err := testEnv.Start()
if err != nil {
os.Exit(1)
}
defer testEnv.Stop()
os.Exit(m.Run())
}
func TestReconcileWithAPIServer(t *testing.T) {
cl, err := client.New(testEnv.Config, client.Options{})
if err != nil {
t.Fatal(err)
}
db := &examplev1.Database{
ObjectMeta: metav1.ObjectMeta{
Name: "integration-test",
Namespace: "default",
},
Spec: examplev1.DatabaseSpec{
Engine: "postgres",
Replicas: 1,
},
}
cl.Create(context.Background(), db)
r := &ReconcileDatabase{Client: cl}
_, err = r.Reconcile(context.Background(), reconcile.Request{
NamespacedName: types.NamespacedName{
Name: "integration-test",
Namespace: "default",
},
})
if err != nil {
t.Errorf("reconcile failed: %v", err)
}
}
envtest starts a real API server, etcd, and controller manager in-memory. This catches bugs that unit tests with fake clients miss.
Advanced Scenarios
Leader Election
For HA deployments with multiple controller replicas, use leader election to ensure only one replica acts at a time:
import "sigs.k8s.io/controller-runtime/pkg/leaderelection"
func main() {
mgr, err := manager.New(cfg, manager.Options{
LeaderElection: true,
LeaderElectionID: "database-controller-leader",
LeaderElectionNamespace: "controllers",
})
if err != nil {
log.Fatal(err)
}
// Uses leader election if multiple replicas are running
// Only the leader processes reconciliation events
log.Fatal(mgr.Start(context.Background()))
}
Set LeaseDuration, RenewDeadline, and RetryPeriod based on your expected restart time. If your controller typically restarts in 30 seconds, set LeaseDuration to 60 seconds and RenewDeadline to 15 seconds.
Finalizers
Finalizers block deletion of a resource until the controller has cleaned up dependent resources:
// Add finalizer on first reconcile
if !controller.ContainsFinalizer(db, "database.example.com_cleanup") {
controller.AddFinalizer(db, "database.example.com_cleanup")
r.Client.Update(ctx, db)
return reconcile.Result{}, nil
}
// Handle deletion
if !db.ObjectMeta.DeletionTimestamp.IsZero() {
// Clean up external resources
if err := r.cleanupExternalResources(ctx, db); err != nil {
return reconcile.Result{}, err
}
// Remove finalizer
controller.RemoveFinalizer(db, "database.example.com_cleanup")
r.Client.Update(ctx, db)
return reconcile.Result{}, nil
}
Without finalizers, deleting a Database custom resource would leave its StatefulSet and PVCs behind.
Owner References and Garbage Collection
Owner references tell Kubernetes to clean up child resources when the owner is deleted:
ownerRef := metav1.OwnerReference{
APIVersion: "example.com/v1",
Kind: "Database",
Name: db.Name,
UID: db.UID,
Controller: ptr.Bool(true),
BlockOwnerDeletion: ptr.Bool(true),
}
ss := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: db.Name,
Namespace: db.Namespace,
OwnerReferences: []metav1.OwnerReference{ownerRef},
},
// ... spec ...
}
Set BlockOwnerDeletion: true if you need the garbage collection to wait for the owner to be fully deleted. This prevents race conditions where a pod tries to mount a volume before the PVC is ready.
Quick Recap Checklist
- Defined CRDs for your domain resources
- Reconciliation loop follows watch -> compare -> act
- Finalizers handle deletion of dependent resources
- Reconciliation is idempotent (calling it N times = once)
- Owner references link child resources for garbage collection
- Leader election set up for HA deployments
- Exponential backoff on reconciliation failures
- Status conditions reflect actual state
- Tested with simulated API server timeouts
- Controller queue depth monitored
Conclusion
Custom controllers extend Kubernetes beyond built-in workload types. CRDs define new resource types. Controllers watch those resources and reconcile actual state toward desired state. Operators encode domain knowledge to handle application-specific lifecycle management.
controller-runtime simplifies controller development by handling caching, API watching, and reconciliation loops. client-go provides the underlying client functionality for direct Kubernetes API access.
Building operators requires understanding of the controller pattern, Go, and Kubernetes internals. For teams running complex stateful applications on Kubernetes, custom operators can automate operational tasks that would otherwise require manual intervention.
For more advanced Kubernetes topics, see the Advanced Kubernetes post.
Category
Related Posts
Container Security: Image Scanning and Vulnerability Management
Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.
Deployment Strategies: Rolling, Blue-Green, and Canary Releases
Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.
Developing Helm Charts: Templates, Values, and Testing
Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.