Building Custom Kubernetes Controllers and Operators

Extend Kubernetes with custom controllers and operators to automate management of complex stateful applications beyond built-in workload types.

published: reading time: 14 min read

Building Custom Kubernetes Controllers and Operators

Kubernetes ships with built-in controllers for Deployments, StatefulSets, DaemonSets, and Jobs. These cover many use cases, but sometimes you need custom behavior. Maybe you want to manage a database that requires coordinated initialization, or an external service that needs lifecycle management. Custom controllers and operators let you extend Kubernetes to handle these scenarios.

This post explains the controller pattern, Custom Resource Definitions (CRDs), and how to build operators using controller-runtime.

If you are new to Kubernetes, start with the Kubernetes fundamentals post. For core workload types, see the Kubernetes Workload Resources post.

When to Use / When Not to Use

Custom controllers solve real problems, but they add operational overhead. Here is when they make sense and when they do not.

When to reach for custom controllers

Reach for a custom controller when you need to manage something Kubernetes cannot natively track. External databases that require coordinated initialization, certificate authorities that need renewal workflows, or multi-tenant platforms where each tenant needs isolated resources all fit this category. If you find yourself running kubectl exec into pods to manually handle state that should be automated, a controller can encode that operational knowledge.

Platform teams building internal developer platforms often need this. Rather than expecting every developer to know how to provision a database correctly, you give them a Database CRD and let the controller handle the details.

When to use operators specifically

An operator is a controller with domain knowledge baked in. Use one when your application has operational procedures that should be automated but currently live in runbooks or wikis. Database failover logic, schema migration pipelines, backup orchestration with retention policies these are all good candidates. The key question is whether your app has stateful operational knowledge worth encoding.

When to skip them

Do not reach for controllers just because they are interesting. A Deployment with proper configuration handles most stateless workloads fine. If you need something done once, a Job or CronJob is simpler and has less moving parts. And if your “automation” is really just running a Helm template with pre-determined values, a controller adds unnecessary complexity.

The operational burden is real. Controllers run somewhere, need monitoring, can have bugs, and require updates when Kubernetes APIs change. Make sure the benefit justifies this before adding a custom controller to your cluster.

Reconciliation Loop Flow

flowchart TD
    A[Watch API Server<br/>for custom resource changes] --> B[Get current state<br/>of managed resources]
    B --> C{Desired state<br/>== Current state?}
    C -->|Yes| A
    C -->|No| D[Reconcile:<br/>Create/Update/Delete]
    D --> E[Update resource status<br/>and conditions]
    E --> A

Controllers follow a declarative reconciliation loop: watch for changes, compare desired vs actual state, act to bring actual toward desired, update status, repeat.

Kubernetes Controller Pattern

A controller is a loop that watches the desired state and reconciles the actual state toward it. Kubernetes ships with many built-in controllers. The Deployment controller watches Deployments and creates ReplicaSets. The ReplicaSet controller creates Pods. The scheduler places Pods onto nodes. The kubelet on each node ensures containers are running.

The reconciliation loop follows this pattern:

watch(current_state) -> compare(desired, current) -> act(bring current to desired)

Controllers use the Kubernetes API to watch resources and create or modify other resources. If a Deployment requests 3 replicas, the controller ensures 3 pods exist. If a pod dies, the controller notices the difference and creates a replacement.

Custom Resources and CRDs

A Custom Resource Definition (CRD) extends the Kubernetes API with new resource types. Once you define a CRD, you can create instances of your custom resource just like built-in resources.

Defining a CRD

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: databases.example.com
spec:
  group: example.com
  names:
    kind: Database
    plural: databases
    shortNames:
      - db
  scope: Namespaced
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                engine:
                  type: string
                  enum: ["postgres", "mysql", "mongodb"]
                version:
                  type: string
                replicas:
                  type: integer
                  minimum: 1
                  maximum: 5
              required:
                - engine
                - version
                - replicas
            status:
              type: object
              properties:
                phase:
                  type: string
                endpoint:
                  type: string

After applying this CRD, you can create Database resources:

apiVersion: example.com/v1
kind: Database
metadata:
  name: my-postgres
spec:
  engine: postgres
  version: "15"
  replicas: 3

Without a controller, these resources sit idle. You need a controller to watch Database resources and do something with them.

Controller-Runtime Library

controller-runtime is the standard library for building Kubernetes controllers. It handles API client setup, informer caching, and reconciliation loops.

Project setup

mkdir my-operator && cd my-operator
go mod init my-operator
go get sigs.k8s.io/controller-runtime@v0.17.0

Main entry point

package main

import (
    "context"
    "log"

    "sigs.k8s.io/controller-runtime/pkg/controller"
    "sigs.k8s.io/controller-runtime/pkg/event"
    "sigs.k8s.io/controller-runtime/pkg/handler"
    "sigs.k8s.io/controller-runtime/pkg/manager"
    "sigs.k8s.io/controller-runtime/pkg/reconcile"
    "sigs.k8s.io/controller-runtime/pkg/source"
)

func main() {
    mgr, err := manager.New(cfg, manager.Options{})
    if err != nil {
        log.Fatal(err)
    }

    ctrl, err := controller.New("database-controller", mgr, controller.Options{
        Reconciler: &ReconcileDatabase{},
    })
    if err != nil {
        log.Fatal(err)
    }

    err = ctrl.Watch(
        &source.Kind{Type: &examplev1.Database{}},
        &handler.EnqueueRequestForObject{},
    )
    if err != nil {
        log.Fatal(err)
    }

    log.Fatal(mgr.Start(context.Background()))
}

Reconciliation Loops and Idempotency

The reconciler compares desired state with actual state and takes action. Reconcilers must be idempotent: applying the same reconciliation multiple times produces the same result.

Reconcile implementation

type ReconcileDatabase struct {
    client client.Client
    scheme *runtime.Scheme
}

func (r *ReconcileDatabase) Reconcile(ctx context.Context, req reconcile.Request) (reconcile.Result, error) {
    log.Printf("Reconciling Database %s/%s", req.Namespace, req.Name)

    // Fetch the Database instance
    db := &examplev1.Database{}
    err := r.client.Get(ctx, req.NamespacedName, db)
    if err != nil {
        return reconcile.Result{}, client.IgnoreNotFound(err)
    }

    // Create or update backend resources based on spec
    if db.Spec.Replicas > 0 {
        result, err := r.ensureStatefulSet(ctx, db)
        if err != nil {
            return result, err
        }
    }

    // Update status
    db.Status.Phase = "Running"
    db.Status.Endpoint = fmt.Sprintf("%s.%s.svc.cluster.local:5432", db.Name, db.Namespace)
    err = r.client.Status().Update(ctx, db)
    if err != nil {
        return reconcile.Result{}, err
    }

    return reconcile.Result{}, nil
}

The reconciler fetches the Database resource, ensures a StatefulSet exists with the right spec, and updates the status. If the spec changes, the next reconciliation pass updates the StatefulSet.

Idempotency in practice

If the StatefulSet already exists with the correct spec, the reconcile loop does nothing. If it needs updating, the controller updates it. If it does not exist, the controller creates it. Running reconciliation 100 times produces the same result as running it once.

Operator Pattern for Stateful Apps

An operator extends the controller pattern with domain knowledge. It encodes operational procedures for managing a specific application. The operator pattern combines CRDs with custom controllers to handle application lifecycle events like backups, failover, and upgrades.

Database operator example

func (r *ReconcileDatabase) ensureStatefulSet(ctx context.Context, db *examplev1.Database) (reconcile.Result, error) {
    ss := &appsv1.StatefulSet{}
    err := r.client.Get(ctx, types.NamespacedName{
        Name:      db.Name,
        Namespace: db.Namespace,
    }, ss)

    if apierrors.IsNotFound(err) {
        // Create new StatefulSet
        ss = r.buildStatefulSet(db)
        err = r.client.Create(ctx, ss)
        return reconcile.Result{}, err
    }

    if err != nil {
        return reconcile.Result{}, err
    }

    // Update if spec changed
    if !reflect.DeepEqual(ss.Spec.Replicas, &db.Spec.Replicas) {
        ss.Spec.Replicas = &db.Spec.Replicas
        err = r.client.Update(ctx, ss)
        return reconcile.Result{}, err
    }

    return reconcile.Result{}, nil
}

func (r *ReconcileDatabase) buildStatefulSet(db *examplev1.Database) *appsv1.StatefulSet {
    replicas := int32(db.Spec.Replicas)
    return &appsv1.StatefulSet{
        ObjectMeta: metav1.ObjectMeta{
            Name:      db.Name,
            Namespace: db.Namespace,
            OwnerReferences: []metav1.OwnerReference{
                *metav1.NewControllerRef(db, examplev1.GroupVersion.WithKind("Database")),
            },
        },
        Spec: appsv1.StatefulSetSpec{
            Replicas: &replicas,
            Selector: &metav1.LabelSelector{
                MatchLabels: map[string]string{
                    "app": db.Name,
                },
            },
            ServiceName: db.Name,
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{
                    Labels: map[string]string{
                        "app": db.Name,
                    },
                },
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{
                        {
                            Name:  "database",
                            Image: fmt.Sprintf("%s:%s", db.Spec.Engine, db.Spec.Version),
                            Ports: []corev1.ContainerPort{
                                {ContainerPort: 5432},
                            },
                        },
                    },
                },
            },
        },
    }
}

The operator uses OwnerReferences to link the StatefulSet to the Database resource. When the Database is deleted, Kubernetes garbage collects the StatefulSet automatically.

Client-Go Basics

client-go is the Go client library for Kubernetes. controller-runtime uses client-go under the hood, but you may need client-go directly for more control or for operators not using controller-runtime.

Direct client usage

import (
    "k8s.io/client-go/kubernetes"
    "k8s.io/client-go/tools/clientcmd"
)

func main() {
    kubeconfig := os.Getenv("KUBECONFIG")
    config, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
    if err != nil {
        log.Fatal(err)
    }

    clientset, err := kubernetes.NewForConfig(config)
    if err != nil {
        log.Fatal(err)
    }

    // List pods
    pods, err := clientset.CoreV1().Pods("default").List(context.Background(), metav1.ListOptions{})
    if err != nil {
        log.Fatal(err)
    }

    for _, pod := range pods.Items {
        fmt.Printf("Pod: %s\n", pod.Name)
    }
}

client-go provides typed clients for all Kubernetes resource types. The clientset.CoreV1().Pods() returns an interface for pod operations.

Choosing Your Approach: Trade-off Comparison

ApproachControlComplexityBest For
controller-runtimeMediumLow-MediumMost custom controllers, standard reconciliation
client-go directlyHighHighVery specific needs, learning K8s internals
Operator SDKMediumMediumProduction operators with OLM packaging
KubebuilderMediumMediumcontroller-runtime projects with best practices

controller-runtime is the default choice for most controllers. It handles caching, watching, and retry logic that you would otherwise have to write yourself. client-go directly gives you more control but more boilerplate. Operator SDK adds scaffolding on top of controller-runtime for production operators.

Production Failure Scenarios

Reconciliation Loop Falling Behind

Under heavy cluster load, the controller can fall behind its watch on the API server. This means spec changes take longer to propagate to actual resources.

The tell is resources converging slowly after a change, sometimes several minutes. You might also see the controller pod consuming more CPU than usual.

Fix this by watching your controller queue depth. If it is growing, increase worker counts or add indexing for large clusters.

API Server Timeout During Reconciliation

Network blips or API server overload cause reconciliation operations to time out. The error message is context deadline exceeded.

This can leave resources partially updated. A pod might exist but not have its final labels set.

Use exponential backoff on retry. Do not just requeue immediately with the same delay.

Leader Election Failures

In HA setups with multiple controller replicas, leader election failures cause two controllers to think they are both in charge. Both then try to manage the same resources.

The symptom is duplicate resources or conflicting updates appearing in quick succession.

controller-runtime has built-in leader election. Make sure your lease duration is long enough for your typical restart time.

Anti-Patterns

Ignoring Deletion

If your reconcile only handles creates and updates, deleted custom resources leave their child resources behind. The controller never cleans up.

Add finalizers to your resources. In the reconcile delete handling, remove the finalizer last after cleaning up dependents.

Skipping Status Updates

Your users have no idea what the controller actually did. Did it create the StatefulSet? Is it still trying? Did it fail?

Update .status on every reconciliation pass. Use conditions to communicate transient states.

Missing Exponential Backoff

When the API server is overloaded and your controller keeps retrying immediately, you make the problem worse. Every failing controller adds load.

Set up exponential backoff with a reasonable initial interval (seconds, not milliseconds). Let the API server recover.

Security Checklist

RBAC for Controllers

Controllers need permissions to manage the resources they watch. Apply least privilege:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: database-controller-role
rules:
  - apiGroups: ["example.com"]
    resources: ["databases"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
  - apiGroups: ["apps"]
    resources: ["statefulsets"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
  - apiGroups: [""]
    resources: ["services"]
    verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

Never give a controller cluster-admin. If your controller only manages resources in one namespace, use a Role and RoleBinding instead of ClusterRole and ClusterRoleBinding.

Service Account for the Controller

Run your controller with a dedicated ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: database-controller-sa
  namespace: controllers
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: database-controller-rolebinding
  namespace: controllers
subjects:
  - kind: ServiceAccount
    name: database-controller-sa
    namespace: controllers
roleRef:
  kind: ClusterRole
  name: database-controller-role
  apiGroup: rbac.authorization.k8s.io

Mount the token inside the controller pod and configure kubeconfig to use it automatically via MountableSecrets.

General Security Practices

  • Run controllers in a dedicated namespace, isolated from application workloads
  • Do not let controllers manage their own deployment (circular dependency risk)
  • Use readOnlyRootFilesystem: true in the controller pod security context where possible
  • Enable RBAC audit logging to track controller permission usage
  • Do not embed credentials in controller code — use ServiceAccounts or external secrets solutions

Testing Strategies

Unit Tests

Test reconciliation logic in isolation without a real API server:

func TestReconcileDatabaseCreatesStatefulSet(t *testing.T) {
    // Setup fake client with Database CR
    cl := fake.NewClientBuilder().
        WithObjects(&examplev1.Database{
            ObjectMeta: metav1.ObjectMeta{
                Name:      "test-db",
                Namespace: "default",
            },
            Spec: examplev1.DatabaseSpec{
                Engine:   "postgres",
                Replicas: 3,
            },
        }).Build()

    r := &ReconcileDatabase{Client: cl}
    req := reconcile.Request{
        NamespacedName: types.NamespacedName{
            Name:      "test-db",
            Namespace: "default",
        },
    }

    _, err := r.Reconcile(context.Background(), req)
    if err != nil {
        t.Fatalf("reconcile failed: %v", err)
    }

    // Verify StatefulSet was created
    ss := &appsv1.StatefulSet{}
    err = cl.Get(context.Background(), req.NamespacedName, ss)
    if err != nil {
        t.Errorf("StatefulSet not created: %v", err)
    }
}

Use sigs.k8s.io/controller-runtime/pkg/reconcile for the reconcile interface and sigs.k8s.io/controller-runtime/pkg/client/fake for the fake client.

Integration Tests

Test against a real API server using an envtest (like Kubebuilder’s test environment):

var testEnv *envtest.Environment

func TestMain(m *testing.M) {
    testEnv = &envtest.Environment{}
    _, err := testEnv.Start()
    if err != nil {
        os.Exit(1)
    }
    defer testEnv.Stop()
    os.Exit(m.Run())
}

func TestReconcileWithAPIServer(t *testing.T) {
    cl, err := client.New(testEnv.Config, client.Options{})
    if err != nil {
        t.Fatal(err)
    }

    db := &examplev1.Database{
        ObjectMeta: metav1.ObjectMeta{
            Name:      "integration-test",
            Namespace: "default",
        },
        Spec: examplev1.DatabaseSpec{
            Engine:   "postgres",
            Replicas: 1,
        },
    }
    cl.Create(context.Background(), db)

    r := &ReconcileDatabase{Client: cl}
    _, err = r.Reconcile(context.Background(), reconcile.Request{
        NamespacedName: types.NamespacedName{
            Name:      "integration-test",
            Namespace: "default",
        },
    })

    if err != nil {
        t.Errorf("reconcile failed: %v", err)
    }
}

envtest starts a real API server, etcd, and controller manager in-memory. This catches bugs that unit tests with fake clients miss.

Advanced Scenarios

Leader Election

For HA deployments with multiple controller replicas, use leader election to ensure only one replica acts at a time:

import "sigs.k8s.io/controller-runtime/pkg/leaderelection"

func main() {
    mgr, err := manager.New(cfg, manager.Options{
        LeaderElection:     true,
        LeaderElectionID:   "database-controller-leader",
        LeaderElectionNamespace: "controllers",
    })
    if err != nil {
        log.Fatal(err)
    }

    // Uses leader election if multiple replicas are running
    // Only the leader processes reconciliation events
    log.Fatal(mgr.Start(context.Background()))
}

Set LeaseDuration, RenewDeadline, and RetryPeriod based on your expected restart time. If your controller typically restarts in 30 seconds, set LeaseDuration to 60 seconds and RenewDeadline to 15 seconds.

Finalizers

Finalizers block deletion of a resource until the controller has cleaned up dependent resources:

// Add finalizer on first reconcile
if !controller.ContainsFinalizer(db, "database.example.com_cleanup") {
    controller.AddFinalizer(db, "database.example.com_cleanup")
    r.Client.Update(ctx, db)
    return reconcile.Result{}, nil
}

// Handle deletion
if !db.ObjectMeta.DeletionTimestamp.IsZero() {
    // Clean up external resources
    if err := r.cleanupExternalResources(ctx, db); err != nil {
        return reconcile.Result{}, err
    }
    // Remove finalizer
    controller.RemoveFinalizer(db, "database.example.com_cleanup")
    r.Client.Update(ctx, db)
    return reconcile.Result{}, nil
}

Without finalizers, deleting a Database custom resource would leave its StatefulSet and PVCs behind.

Owner References and Garbage Collection

Owner references tell Kubernetes to clean up child resources when the owner is deleted:

ownerRef := metav1.OwnerReference{
    APIVersion:         "example.com/v1",
    Kind:               "Database",
    Name:               db.Name,
    UID:                db.UID,
    Controller:         ptr.Bool(true),
    BlockOwnerDeletion: ptr.Bool(true),
}

ss := &appsv1.StatefulSet{
    ObjectMeta: metav1.ObjectMeta{
        Name:            db.Name,
        Namespace:       db.Namespace,
        OwnerReferences: []metav1.OwnerReference{ownerRef},
    },
    // ... spec ...
}

Set BlockOwnerDeletion: true if you need the garbage collection to wait for the owner to be fully deleted. This prevents race conditions where a pod tries to mount a volume before the PVC is ready.

Quick Recap Checklist

  • Defined CRDs for your domain resources
  • Reconciliation loop follows watch -> compare -> act
  • Finalizers handle deletion of dependent resources
  • Reconciliation is idempotent (calling it N times = once)
  • Owner references link child resources for garbage collection
  • Leader election set up for HA deployments
  • Exponential backoff on reconciliation failures
  • Status conditions reflect actual state
  • Tested with simulated API server timeouts
  • Controller queue depth monitored

Conclusion

Custom controllers extend Kubernetes beyond built-in workload types. CRDs define new resource types. Controllers watch those resources and reconcile actual state toward desired state. Operators encode domain knowledge to handle application-specific lifecycle management.

controller-runtime simplifies controller development by handling caching, API watching, and reconciliation loops. client-go provides the underlying client functionality for direct Kubernetes API access.

Building operators requires understanding of the controller pattern, Go, and Kubernetes internals. For teams running complex stateful applications on Kubernetes, custom operators can automate operational tasks that would otherwise require manual intervention.

For more advanced Kubernetes topics, see the Advanced Kubernetes post.

Category

Related Posts

Container Security: Image Scanning and Vulnerability Management

Implement comprehensive container security: from scanning images for vulnerabilities to runtime security monitoring and secrets protection.

#container-security #docker #kubernetes

Deployment Strategies: Rolling, Blue-Green, and Canary Releases

Compare and implement deployment strategies—rolling updates, blue-green deployments, and canary releases—to reduce risk and enable safe production releases.

#deployment #devops #kubernetes

Developing Helm Charts: Templates, Values, and Testing

Create production-ready Helm charts with Go templates, custom value schemas, and testing using Helm unittest and ct.

#helm #kubernetes #devops