Load Balancing Algorithms: Round Robin, Least Connections, and Beyond

Explore load balancing algorithms used in microservices including round robin, least connections, weighted, IP hash, and adaptive algorithms.

published: reading time: 14 min read

Load Balancing Algorithms: Round Robin, Least Connections, and Beyond

Every request hitting your microservices deployment faces the same fundamental question: which backend service instance handles it? Someone has to make that call, and that someone is the load balancing algorithm. Get it right and your system hums along even under heavy traffic. Get it wrong and you will be debugging why one server is on fire while others sit idle.

Microservices complicate this decision. You might have dozens of instances spread across availability zones, each with different capacities, varying response times, and potentially different operational states. The algorithm has to navigate all of that while keeping response times low and routing around failures automatically.

This article walks through load balancing approaches from basic round robin to adaptive algorithms that watch real-time server health and adjust accordingly.

The Role of Load Balancing in Microservices

In a monolith, scaling means copying the same application. Load balancing is simple: distribute requests across identical instances.

Microservices shift the picture. Each service runs multiple instances with different capacities. A payment service making synchronous database calls behaves completely differently from a caching service returning data from memory. A recommendation service might take 500ms while an inventory check finishes in 20ms. The load balancer has to account for all of this variation.

Beyond simple distribution, load balancers in microservices handle service discovery, health checking, circuit breaker integration, metrics collection, and SSL termination. The algorithm you choose affects all of these. Route based on real-time load and your circuit breakers stay quiet. Route poorly and circuit breakers work overtime protecting overloaded servers.

Static Algorithms

Static algorithms make routing decisions without considering current system state. They follow predetermined rules configured beforehand. The advantages are real: no state tracking overhead, predictable behavior, and straightforward debugging.

Round Robin

Round robin cycles through servers in order: Server 1, Server 2, Server 3, then back to Server 1. Each request goes to the next server in sequence.

No state to maintain. Each decision is independent. This makes it extremely fast and memory-efficient. No tracking connection counts, no calculating server load.

Round robin works when all servers have identical capacity and similar request processing times. Perfect homogeneity rarely exists though. If Server 1 has twice the memory of Server 2, round robin still sends equal traffic to both. Server 1 sits underutilized while Server 2 struggles.

DNS-based load balancing often uses round robin. Each DNS response rotates through available server IPs. Simple, but lacks awareness of server health or current conditions. Fine for some scenarios, but production microservice deployments usually need more sophistication.

graph LR
    A[Request 1] --> B[Server 1]
    C[Request 2] --> D[Server 2]
    E[Request 3] --> F[Server 3]
    G[Request 4] --> B
    H[Request 5] --> D

Weighted Round Robin

Weighted round robin assigns a weight to each server based on capacity. Servers with higher weights get more traffic proportionally. If Server 1 has weight 3 and Server 2 has weight 1, Server 1 gets three requests for every one that goes to Server 2.

Weights typically reflect server specs: CPU cores, memory size, expected performance. A newer server with more resources handles heavier loads. An older server running background workloads gets lighter traffic.

The catch is keeping weights accurate. A server that suddenly gets busy still receives its configured share of new requests. Weights reflect theoretical capacity, not current load. Regular recalibration becomes necessary as workloads change.

This approach suits heterogeneous server pools with relatively stable load patterns. When capacities shift frequently, static weights become maintenance burdens.

Random

Random routing distributes requests using a random number generator. Counterintuitively, random selection distributes load quite evenly under moderate to high traffic volumes.

With enough traffic, random selection approximates equal distribution naturally. The law of large numbers ensures convergence over time. For very high traffic systems where state management becomes expensive, random offers a simple alternative with no coordination overhead.

Variance is higher under low traffic. One server might get lucky while another receives fewer requests. Over time this evens out, but bursty traffic causes temporary imbalance.

Random works as a baseline algorithm. Some sophisticated approaches use random selection as a fallback or combine it with other methods.

IP Hash

IP hash routes requests based on a hash of the client IP address. The same client IP always routes to the same backend server. This provides session affinity without cookies or tracking mechanisms.

The hash function maps client IPs to servers. A simple modulo of the IP address integer value by server count works, but causes massive redistribution when servers are added or removed. Consistent hashing reduces this reshuffling, keeping most clients with the same server even when the pool changes.

IP hash breaks down when many clients share the same source IP. Users behind corporate proxies or NAT gateways all appear as one IP, routing to the same server and potentially creating hotspots. It also has no awareness of server load, so a busy server still receives its hash-allocated share.

For simple session affinity where clients need to return to the same server, IP hash works. For more control, cookie-based sticky sessions or application-level routing work better.

Dynamic Algorithms

Dynamic algorithms factor in current system state when making routing decisions. They adapt to actual server load, response times, and health status. This sophistication comes with tradeoffs: state tracking, constant calculations, and complexity that sometimes causes unexpected behavior.

Least Connections

Least connections routes new requests to the server with the fewest active connections. A server processing ten long-running requests might get a new request before a server that just started on an identical request. The algorithm adapts to current load rather than distributing evenly based on configuration.

This works well for workloads with variable request durations. A request holding a database connection for ten seconds should count differently than one returning cached data in milliseconds. Least connections captures these differences through active connection counts.

The algorithm requires tracking active connections for each backend in load balancer memory. This state updates with each request and response. Under very high traffic, the overhead of tracking and comparing connection counts adds up.

Least connections can cause thrashing under certain patterns. If many requests complete simultaneously, multiple new requests all see the same low count and flood the same server before it updates. Using a smoothed average rather than raw counts mitigates this.

Least Response Time

Least response time routes to the server with the lowest combined metric of active connections and average response time. It combines load awareness with performance awareness in a single metric.

The calculation typically weights active connection count against recent response times. A server with fewer connections but much slower responses might not win. A moderately loaded server with fast responses wins.

AWS ALB uses least outstanding requests, focusing on how many requests are waiting versus actively processed. Google Cloud Load Balancing uses a similar model focused on minimizing latency.

This algorithm works well when response times vary significantly between requests and servers. A mix of fast cached responses and slow database queries benefits from response time awareness.

Resource-Based Routing

Resource-based routing makes decisions based on actual server resource utilization. The load balancer queries each server for current CPU, memory, or application-specific metrics before routing.

This requires agents on each server reporting metrics to the load balancer. The overhead of collecting and communicating metrics limits update frequency. The benefit is routing decisions that truly reflect server capacity rather than indirect signals like connection counts.

Some implementations use active reporting where servers push metrics. Others use passive monitoring where the load balancer tracks response times as a proxy for load. Active reporting is more accurate but adds complexity and network overhead.

Resource-based routing suits environments where server capacity varies significantly or where you want fine-grained control based on actual resource consumption.

Adaptive Algorithms

Adaptive algorithms go beyond simple metrics to make predictive routing decisions. They might watch trends in response time changes, error rates, or capacity utilization and route traffic before problems occur.

These algorithms often use machine learning to identify patterns. A server showing increasing response times might have traffic shifted away before it becomes critical. Error rate spikes trigger preemptive routing away from failing instances.

The complexity of adaptive algorithms makes them harder to debug and predict. The benefit is handling edge cases that rule-based algorithms miss. Production deployments often layer adaptive algorithms on top of simpler fallbacks.

Session Persistence and Sticky Sessions

Session persistence routes a particular user’s requests to the same backend server. Without it, a user who logs in on Server 1 might get routed to Server 2 on their next request, which has no memory of their session.

Sticky sessions create problems though. They complicate maintenance windows since taking down a server disconnects active users. They make horizontal scaling harder because load cannot be freely redistributed. A server getting stuck with long-running sessions might accumulate disproportionate load.

Sticky sessions matter most for applications that store session state locally rather than in distributed caches. Shopping carts, multi-step form wizard state, in-memory computation results might rely on server affinity. Most modern applications store session state externally in Redis or similar, reducing the need for sticky sessions.

When you do need sticky sessions, cookie-based affinity works better than IP hash. Cookies give more control and work correctly even when clients switch networks or share IPs. The load balancer reads a cookie to determine the target server.

Cookie-based sticky sessions insert a tracking cookie set by the load balancer. The first request gets routed normally, and the load balancer sets a cookie identifying the assigned server. Subsequent requests include the cookie, and the load balancer reads it to maintain affinity.

Circuit Breaker Integration

Load balancing algorithms and circuit breakers work together. The load balancer distributes traffic, but when a service starts failing, the circuit breaker stops traffic to failing instances.

Poor load balancing forces circuit breakers to work harder. A server running hot with CPU maxed out receives requests that timeout, triggering circuit breaker opens for that instance. Better load balancing would have spread work more evenly, keeping the server from becoming overloaded in the first place.

Some load balancers integrate circuit breaking directly. When error rates exceed thresholds for a particular backend, the load balancer stops routing traffic. This happens without needing separate circuit breaker libraries in your application code.

The interaction between load balancing and circuit breaking matters most during recovery. When a circuit breaker closes and traffic resumes, the load balancer should ease traffic back gradually rather than flooding the recovering service.

Client-Side vs Server-Side Load Balancing

Traditional load balancing happens server-side: a dedicated load balancer sits between clients and servers, making routing decisions for all incoming traffic.

Client-side load balancing puts the routing logic in the client. The client maintains a list of available servers and picks which one to call. Netflix’s Ribbon library is an example of client-side load balancing for JVM applications.

Client-side balancing removes the load balancer as a single point of failure. The client directly picks a server, reducing network hops. The tradeoff is that server list management becomes the client’s responsibility. When servers scale up or down, clients need to know.

Service discovery integrates with both approaches. Server-side load balancers often query service registries directly. Client-side load balancers typically receive server lists from service discovery and cache them locally.

graph TD
    subgraph ServerSide["Server-Side Load Balancing"]
        Client1[Client] --> LB[Load Balancer]
        LB --> S1[Server 1]
        LB --> S2[Server 2]
        LB --> S3[Server 3]
    end
    subgraph ClientSide["Client-Side Load Balancing"]
        Client2[Client] --> CL[Client Library]
        CL --> S4[Server 1]
        CL --> S5[Server 2]
        CL --> S6[Server 3]
    end

Server-side load balancing works well when you want centralized control, easier configuration updates, and built-in infrastructure like health checking and circuit breaking. Client-side load balancing suits environments where you want to eliminate the load balancer hop and reduce infrastructure dependencies.

Examples from Real Systems

NGINX

NGINX supports multiple load balancing algorithms in its upstream configuration:

upstream backend {
    least_conn;  # Least connections algorithm
    server 192.168.1.10:8080 weight=3;
    server 192.168.1.11:8080 weight=1;
    server 192.168.1.12:8080 down;  # Marked as down
}

NGINX Plus adds least time and session persistence features. The free version provides round robin, least connections, and IP hash.

HAProxy

HAProxy offers sophisticated load balancing with clear configuration syntax:

backend servers
    balance roundrobin
    balance leastconn
    balance source
    server s1 192.168.1.10:8080 check inter 2000 fall 3
    server s2 192.168.1.11:8080 check inter 2000 fall 3
    server s3 192.168.1.12:8080 check inter 2000 fall 3

HAProxy’s source balance algorithm implements IP hash-like functionality. The check keyword enables health monitoring with configurable intervals and failure thresholds.

AWS ALB

AWS Application Load Balancer provides three routing algorithms:

  • Round Robin - Default, cycles through targets in the target group
  • Least Outstanding Requests - Routes to the target with the fewest pending requests
  • Flow Hash - Routes based on the tuple of protocol, source IP, destination IP, source port, destination port, and TCP sequence number

ALB integrates with Auto Scaling Groups, automatically distributing traffic across healthy instances as they scale.

Algorithm Comparison

AlgorithmState RequiredAdapts to LoadSession AffinityComplexityBest For
Round RobinNoneNoNoLowHomogeneous servers, simple deployments
Weighted Round RobinServer weightsNoNoLowHeterogeneous servers with stable load
RandomNoneNoNoLowHigh traffic where simplicity matters
IP HashNoneNoYesLowSession affinity without cookies
Least ConnectionsActive connectionsYesNoMediumVariable request durations
Least Response TimeConnections + latencyYesNoMediumLatency-sensitive applications
Resource-BasedResource metricsYesNoHighFine-grained capacity routing
AdaptiveMultiple metricsYesNoHighComplex deployments with trends

Choosing the Right Algorithm

Algorithm selection depends on your workload characteristics and infrastructure. Here is what to think about:

Server homogeneity: If all servers have identical capacity and similar performance, round robin works fine. If servers vary significantly, use weighted variants.

Request characteristics: Do requests take roughly the same time, or do they vary widely? Long-running requests benefit from least connections. Fast, consistent requests work fine with round robin.

Session requirements: Do users need to return to the same server? Cookie-based sticky sessions or IP hash handle this. External session storage eliminates the need entirely.

Latency sensitivity: Are response times critical? Least response time or latency-based routing helps. Background tasks work fine with simple round robin.

Operational complexity: Sophisticated algorithms require more monitoring and tuning. Start simple and add complexity only when measurements show it is needed.

For most web applications, least connections or weighted round robin hits a good balance. These handle heterogeneous servers reasonably well and adapt to varying load without excessive complexity.

Conclusion

Load balancing algorithms run from trivially simple to sophisticated. Round robin and random need no state and distribute load evenly under high traffic. Weighted variants handle capacity differences. Least connections adapts to current load but adds complexity.

Latency and resource-based approaches provide more responsive routing but require additional infrastructure. IP hash offers session affinity at the cost of potential hotspots.

Honestly, the algorithm matters less than the fundamentals: health checking, appropriate server sizing, and not overloading any single instance. Pick something reasonable, monitor it, and adjust as needed.

For related reading, see my post on Load Balancing for fundamentals of load balancer architecture. To understand how load balancers integrate with API management, see API Gateway. For resilience patterns that work alongside load balancing, see Resilience Patterns.

Category

Related Posts

Cache Stampede Prevention: Protecting Your Cache

Learn how single-flight, request coalescing, and probabilistic early expiration prevent cache stampedes that can overwhelm your database.

#cache #cache-stampede #performance

Amazon's Architecture: Lessons from the Pioneer of Microservices

Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.

#microservices #amazon #architecture

Client-Side Discovery: Direct Service Routing in Microservices

Explore client-side service discovery patterns, how clients directly query the service registry, and when this approach works best.

#microservices #client-side-discovery #service-discovery