DNS-Based Service Discovery: Kubernetes, Consul, and etcd

Learn how DNS-based service discovery works in microservices platforms like Kubernetes, Consul, and etcd, including DNS naming conventions and SRV records.

published: reading time: 9 min read

DNS-Based Service Discovery: Kubernetes, Consul, and etcd

Service discovery sits at the heart of any distributed system. Before a client can communicate with a service, it needs to find where that service lives on the network. DNS, the same protocol that translates domain names to IP addresses, has been stretched and adapted to solve this problem in modern microservices platforms.

This post covers how DNS-based service discovery works, the trade-offs involved, and how platforms like Kubernetes, Consul, and etcd each approach it.

How DNS Has Been Adapted for Service Discovery

Traditional DNS was designed for relatively static infrastructure. A server might change IP addresses once every few months, so TTLs (Time To Live) of hours or even days made sense. Microservices change constantly—pods get created and destroyed, containers scale up and down, services move between nodes.

DNS-based service discovery adapts the protocol in several ways:

Short TTLs: Service records expire quickly, often within 30 seconds or less. This allows clients to pick up changes rapidly without overwhelming the DNS infrastructure with queries.

Dynamic Registration: Services register themselves (or are registered by an agent) as they come online. When a service instance fails or is replaced, its DNS record is removed automatically.

SRV Records: Standard DNS A records map a name to an IP address. But services run on different ports. SRV records store both the target host and the port number, allowing complete endpoint information in DNS.

Multi-value Responses: A single DNS query can return multiple IP addresses. Load balancing becomes a matter of rotating through these values.

graph TD
    Client[Client Application] -->|Queries| DNS[DNS Server]
    DNS -->|Returns A/SRV records| Client

    subgraph "Service Instances"
        S1[Service-A:8080]
        S2[Service-A:8080]
        S3[Service-B:3000]
    end

    Registry[Service Registry] -->|Watches for changes| DNS
    S1 -->|Registers| Registry
    S2 -->|Registers| Registry
    S3 -->|Registers| Registry

    S1 -.->|Health check fails| Registry
    Registry -.->|Removes record| DNS

The diagram shows the basic pattern. Services register with a central registry. The registry pushes updates to DNS. Clients query DNS to discover endpoints. When health checks fail, records disappear.

Kubernetes DNS

Kubernetes operates its own internal DNS service for pod and service discovery. Understanding how this works helps you design better service communication patterns.

kube-dns and CoreDNS

Early Kubernetes versions shipped with kube-dns, which bundled SkyDNS. Modern clusters run CoreDNS instead—a modular DNS server written in Go that became the default in Kubernetes 1.11.

CoreDNS runs as a deployment in kube-system, usually with a couple replicas for HA. It watches the Kubernetes API for service and endpoint changes, rebuilding its zone data on every meaningful change.

The CoreDNS configuration lives in a ConfigMap fittingly named coredns. The default setup looks like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf
        cache 30
    }

Service DNS Naming Conventions

Kubernetes services get DNS names that follow a predictable pattern:

<service-name>.<namespace>.svc.<cluster-domain>

So a service named “api-gateway” in the “production” namespace becomes:

api-gateway.production.svc.cluster.local

Same namespace? You can usually just use the service name. Different namespace? You need the full qualified name.

Headless services behave differently. When you set clusterIP: None, CoreDNS skips the VIP entirely and returns the IPs of backing pods directly:

apiVersion: v1
kind: Service
metadata:
  name: stateful-service
spec:
  clusterIP: None # This makes it headless
  selector:
    app: stateful-app
  ports:
    - port: 8080
      targetPort: http

With a headless service, DNS returns individual pod IPs. Your application handles load balancing—which is exactly what you want for stateful services where clients need to reach specific pods directly.

Consul DNS Interface

HashiCorp Consul takes a more traditional approach to service discovery. It runs a distributed, gossip-based cluster with agents on every node. Services register with local agents, which gossip information across the cluster.

The Consul DNS interface exposes everything through standard DNS queries. No API endpoint to query—just familiar DNS tools:

# Query for web service instances
dig @127.0.0.1 -p 8600 web.service.consul SRV

# Get just the IP addresses
dig @127.0.0.1 -p 8600 web.service.consul

Consul uses the .consul domain by default. Queries for web.service.consul return A records with the IP addresses of all healthy service instances.

DNS SRV Records for Port Discovery

SRV records become essential when services run on non-standard ports. Imagine a service catalog where different teams run their own instances on arbitrary ports. Clients do not hardcode port numbers—they discover them through DNS.

A Consul SRV response looks something like this:

;; ANSWER SECTION:
api.service.consul.    0   IN  SRV 1 1 8080 node1.service.consul.
api.service.consul.    0   IN  SRV 1 1 8081 node2.service.consul.

;; ADDITIONAL SECTION:
node1.service.consul.  0   IN  A    10.0.1.10
node2.service.consul.  0   IN  A    10.0.1.11

The SRV record tells you that two instances exist, on ports 8080 and 8081 respectively, running on nodes with those IP addresses.

Prepared Queries

Consul supports prepared queries—saved query templates on the server side. These enable advanced patterns like geo-based routing:

{
  "Name": "geo-routing",
  "Query": "api-fleet",
  "DNS": {
    "TTL": "10s"
  },
  "ServiceMeta": {
    "version": "v2"
  }
}

Clients then query geo-routing.query.consul and Consul returns instances based on the query definition.

etcd for Service Registration

etcd is the persistent store behind Kubernetes and many other distributed systems. It is not a DNS server, but it often sits underneath service registries that expose DNS interfaces.

Services store endpoint information in etcd’s hierarchical key-value space:

/services/api/10.0.1.10:8080
/services/api/10.0.1.11:8080

A separate component—etcd-watcher, a custom controller, whatever—watches these paths and updates DNS records when values change. Storage stays separate from DNS serving, which keeps things clean.

The advantage of etcd is its consistency and availability story. As a Raft-based consensus system, it handles network partitions gracefully and gives you strong consistency guarantees for registration data.

Watch operations in etcd notify listeners of changes immediately:

watcher := client.Watch(ctx, "/services/", client.WithPrefix())
for resp := range watcher {
    for _, event := range resp.Events {
        // Process registration/deregistration
    }
}

This reactive model works well for keeping DNS records current.

DNS Caching Challenges and TTL Considerations

Caching happens at multiple layers in the DNS resolution path. Each layer brings its own headaches for service discovery.

Application-Level Caching: Applications cache DNS lookups to avoid repeated queries. If your cached entry lives for 5 minutes but the service moved 1 minute ago, you are sending traffic to a dead address.

Operating System Caching: Most operating systems cache DNS responses based on the TTL in the record. Kubernetes DNS records typically have TTLs around 30 seconds, which most systems respect.

Load Balancer / Proxy Caching: If your service sits behind a proxy or load balancer, that component may cache DNS independently. Your 30-second TTL means nothing if the proxy cached the entry for 5 minutes.

The result: a window where traffic flows to addresses that no longer exist. Mitigation strategies include:

  • Readiness Probes: Kubernetes uses readiness probes to remove unhealthy pods from service endpoints immediately, regardless of DNS caching.

  • Connection Draining: Allow existing connections to complete while routing new traffic only to healthy instances.

  • Client-Side Re-resolution: Some clients re-resolve DNS periodically or on connection errors, rather than relying solely on cached entries.

  • Very Short TTLs: Some deployments use TTLs under 10 seconds, accepting the increased query load in exchange for faster convergence.

Headless Services in Kubernetes

Headless services change DNS semantics in ways that matter. When you set clusterIP: None, CoreDNS returns pod IPs directly instead of a service VIP.

This shows up in a few scenarios:

StatefulSets: Database clusters like MongoDB or Cassandra need pod-to-pod communication where clients connect to specific instances. Headless services let DNS resolve to individual pod IPs.

Custom Load Balancing: Some applications implement their own load balancing. They need to see all available pods and make their own routing decisions.

Service Mesh: With a service mesh like Istio, sidecar proxies often handle load balancing. They may need direct pod IPs for proper traffic management.

The trade-off is that your application takes on complexity the service proxy would normally handle. Without a VIP, failed pods mean failed connections unless your client implements retry logic.

Service discovery DNS typically stays local to your infrastructure. These addresses do not resolve on the public internet and do not need delegation to global DNS servers.

Link-local DNS (also called private DNS) operates within a bounded environment:

  • Kubernetes cluster DNS lives in cluster.local (or a custom domain)
  • Consul datacenter DNS lives in datacenter.consul
  • VPC private DNS in AWS uses .compute.internal

These namespaces do not conflict with public DNS. You can have api.service.consul internally while someone else has api.com on the public internet.

When you need external access, you expose services through an ingress or gateway that bridges internal and external DNS. The external name points to a load balancer or reverse proxy that forwards traffic into your internal network.

Global DNS matters when you need geographic distribution of service discovery. A service registered in one datacenter should be discoverable from another. This requires replication mechanisms—Consul’s multi-datacenter support, for instance, replicates service registrations across datacenters so queries anywhere return consistent results.

Connecting the Patterns

DNS-based service discovery works well in many scenarios, but it has limits. When you need:

  • Strong consistency: DNS caching means some clients may see stale data. For leader election or configuration changes, you need a more consistent store.

  • Rich metadata: DNS records carry limited information. When you need health check details, latency metrics, or custom attributes, a service registry with an API works better.

  • Fine-grained routing: DNS operates at host/port level. When you need header-based routing, traffic splitting, or canary deployments, your service mesh or API gateway provides more control.

Most production systems use DNS for basic discovery, the service mesh for traffic management, and a service registry API for operational tooling.

For further reading, explore how Kubernetes implements service networking, or dig into the Microservices Architecture patterns that depend on this discovery layer.

Category

Related Posts

Client-Side Discovery: Direct Service Routing in Microservices

Explore client-side service discovery patterns, how clients directly query the service registry, and when this approach works best.

#microservices #client-side-discovery #service-discovery

Server-Side Discovery: Load Balancer-Based Service Routing

Learn how server-side discovery uses load balancers and reverse proxies to route service requests in microservices architectures.

#microservices #server-side-discovery #load-balancing

Service Registry: Dynamic Service Discovery in Microservices

Understand how service registries enable dynamic service discovery, health tracking, and failover in distributed microservices systems.

#microservices #service-registry #service-discovery