API Gateway: The Single Entry Point for Microservices Architecture

Learn how API gateways work, when to use them, architecture patterns, failure scenarios, and implementation strategies for production microservices.

published: reading time: 15 min read

API Gateway: The Single Entry Point for Microservices Architecture

An API gateway sits at the entrance of your backend services and handles everything that would otherwise clutter individual services or repeat across them: request routing, authentication, rate limiting, protocol translation. When a mobile app asks your product catalog for data, the gateway receives that request, checks the user’s session token, applies rate limiting rules, and forwards it to the product service—all in a single hop.

This centralized layer makes client code simpler since applications talk to one endpoint instead of keeping track of multiple service addresses. It also gives you a natural chokepoint for enforcing security policies, merging responses from different services, and gathering metrics about how your API gets used. For teams building microservices at scale, an API gateway is not optional—it is foundational infrastructure.


When to Use / When Not to Use

ScenarioRecommendation
Multiple backend services need unified access controlUse API Gateway
Mobile, web, and third-party clients consume the same APIsUse API Gateway
You need centralized rate limiting and throttlingUse API Gateway
Service aggregation is required for client convenienceUse API Gateway
Single monolithic application with no external clientsDo NOT use API Gateway
Services are tightly coupled and share a deployment unitDo NOT use API Gateway
Ultra-low latency is critical (gateway adds ~1-3ms)Consider alternatives
Simple CRUD application with one or two servicesConsider direct service calls

When TO Use an API Gateway

  • Unified client access: Your mobile app, web app, and third-party integrations all hit different services. Without a gateway, clients need to know about every service endpoint, certificate, and authentication mechanism.
  • Shared authentication and authorization: You want a single place to validate JWTs, check permissions, and reject unauthorized requests before they reach your services.
  • Rate limiting at the edge: You need to protect your services from traffic spikes, abusive clients, or accidental misconfiguration without adding this logic to every service.
  • Protocol translation: Your mobile clients use REST, but your internal services might use gRPC or WebSocket. The gateway translates between them.
  • Request aggregation: A mobile screen needs data from three different services. Without aggregation in the gateway, the client makes three separate calls with associated latency and complexity.

When NOT to Use an API Gateway

  • Adding unnecessary hops: If your system is a simple monolith or a handful of tightly coordinated services, the gateway introduces latency without meaningful benefit.
  • Bypassing for internal services: In some architectures, internal services behind the gateway still need to call each other directly. The gateway becomes a bottleneck rather than a helper.
  • Single-purpose applications: A data processing pipeline with no external clients does not need a gateway.
  • Latency-sensitive paths: Every request going through the gateway adds 1-3ms. For extremely latency-sensitive use cases, this matters.

Architecture Diagram

Request Flow

sequenceDiagram
    participant Client
    participant Gateway as API Gateway
    participant Auth as Auth Service
    participant Catalog as Product Service
    participant Order as Order Service
    participant Cache as Redis Cache

    Client->>Gateway: POST /api/products/123
    Gateway->>Gateway: Extract JWT, Rate Limit Check
    Gateway->>Auth: Validate Token
    Auth-->>Gateway: Token Valid
    Gateway->>Cache: Check Cache
    Cache-->>Gateway: Cache Hit
    Gateway-->>Client: Product JSON

    Client->>Gateway: POST /api/orders
    Gateway->>Gateway: Extract JWT, Rate Limit Check
    Gateway->>Auth: Validate Token
    Auth-->>Gateway: Token Valid
    Gateway->>Catalog: Check Product Availability
    Catalog-->>Gateway: Available
    Gateway->>Order: Create Order
    Order-->>Gateway: Order Created
    Gateway-->>Client: Order Confirmation

Gateway Internal Components

graph TD
    A[Client Request] --> B[TLS Termination]
    B --> C[Authentication]
    C --> D[Authorization]
    D --> E[Rate Limiting]
    E --> F[Request Routing]
    F --> G[Service Discovery]
    G --> H[Backend Service]
    H --> F
    F --> I[Response Aggregation]
    I --> J[Metrics Collection]
    J --> K[Client Response]

    subgraph Security Layer
        C
        D
        E
    end

    subgraph Routing Layer
        F
        G
    end

Failure Flow

graph TD
    A[Client Request] --> B{Gateway Available?}
    B -->|No| C[Return 503 Service Unavailable]
    B -->|Yes| D{Auth Passed?}
    D -->|No| E[Return 401 Unauthorized]
    D -->|Yes| F{Rate Limit OK?}
    F -->|No| G[Return 429 Too Many Requests]
    F -->|Yes| H{Backend Service Available?}
    H -->|No| I[Return 502 Bad Gateway]
    H -->|Yes| J{Request Valid?}
    J -->|No| K[Return 400 Bad Request]
    J -->|Yes| L[Forward to Service]
    L --> M{Service Timeout?}
    M -->|Yes| N[Return 504 Gateway Timeout]
    M -->|No| O[Return Service Response]

Production Failure Scenarios

Failure ScenarioImpactMitigation
Gateway instance crashAll traffic failsRun multiple gateway instances behind load balancer; health checks detect failures
Backend service timeoutClient hangs indefinitelySet aggressive timeouts (e.g., 5s); circuit breaker returns error immediately
Auth service unavailableNo requests can be validatedCache JWT validation results with short TTL; allow requests if auth is slow
Rate limiter memory exhaustionRate limiting fails openUse Redis-backed rate limiting; set hard limits on memory per tenant
Gateway misconfigurationAll traffic routing incorrectlyUse version-controlled config; canary deployments for config changes
SSL/TLS certificate expiryHTTPS requests failAutomate certificate renewal (Let’s Encrypt); alert 30 days before expiry
Service discovery returns stale IPsRequests go to dead instancesUse short TTL in service registry; health checks remove unhealthy instances
Request payload too largeMemory exhaustion on gatewaySet max request size limits; reject oversized payloads early

Trade-off Analysis

FactorWith API GatewayWithout API Gateway
Latency+1-3ms per requestBaseline
ConsistencyCentralized auth/rate limitingDuplicated per service
CostGateway instances + operationNo additional cost
ComplexityCentralized logic, single configDistributed logic, multiple configs
OperabilitySingle point to monitorMonitor each service separately
Client complexityLow (one endpoint)High (manage multiple endpoints)
DebuggingSingle point to traceTrace across multiple services
Single point of failureYes, unless highly availableNo (but more complex clients)
FlexibilityLimited by gateway capabilitiesFull flexibility per service

API Gateway vs Service Mesh

Gateway vs Service Mesh

AspectAPI GatewayService Mesh
LayerL7 (Application)L4/L7 (Transport + Application)
ScopeNorth-South traffic (client to service)East-West traffic (service to service)
Typical UsersPlatform teams, API product teamsDevOps, SRE teams
FeaturesAuth, routing, aggregation, protocol translationmTLS, retries, circuit breaking
DeploymentSits at edgeSidecar proxies on each service

For most architectures, you need both. The API gateway handles external client traffic while a service mesh handles internal service-to-service communication. See Service Mesh for a deep dive.


Implementation Example (Node.js)

Here is a minimal but production-ready API gateway implementation using Express:

const express = require("express");
const axios = require("axios");
const rateLimit = require("express-rate-limit");
const jwt = require("jsonwebtoken");

const app = express();

// Configuration
const PORT = process.env.PORT || 3000;
const AUTH_SERVICE_URL =
  process.env.AUTH_SERVICE_URL || "http://auth-service:8080";
const PRODUCT_SERVICE_URL =
  process.env.PRODUCT_SERVICE_URL || "http://product-service:8080";
const ORDER_SERVICE_URL =
  process.env.ORDER_SERVICE_URL || "http://order-service:8080";

// Middleware: Parse JSON with size limit
app.use(express.json({ limit: "1mb" }));

// Middleware: Request ID for tracing
app.use((req, res, next) => {
  req.id = crypto.randomUUID();
  res.setHeader("X-Request-ID", req.id);
  next();
});

// Middleware: Rate limiting (Redis-backed in production)
const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100, // 100 requests per minute per IP
  message: { error: "Too many requests", requestId: (req) => req.id },
  standardHeaders: true,
  legacyHeaders: false,
});
app.use("/api/", limiter);

// Middleware: Authentication
async function authenticate(req, res, next) {
  const token = req.headers.authorization?.replace("Bearer ", "");

  if (!token) {
    return res
      .status(401)
      .json({ error: "Missing authorization token", requestId: req.id });
  }

  try {
    // In production, use a distributed cache for validation results
    const decoded = jwt.verify(token, process.env.JWT_SECRET);
    req.user = decoded;
    next();
  } catch (error) {
    return res.status(401).json({ error: "Invalid token", requestId: req.id });
  }
}

// Middleware: Authorization
function authorize(...allowedRoles) {
  return (req, res, next) => {
    if (!req.user || !allowedRoles.includes(req.user.role)) {
      return res
        .status(403)
        .json({ error: "Insufficient permissions", requestId: req.id });
    }
    next();
  };
}

// Health check endpoint
app.get("/health", (req, res) => {
  res.json({ status: "healthy", timestamp: new Date().toISOString() });
});

// Route: Product catalog with circuit breaker
const { CircuitBreaker } = require("opossum");

const productCircuit = new CircuitBreaker(
  async (productId) => {
    const response = await axios.get(
      `${PRODUCT_SERVICE_URL}/products/${productId}`,
      {
        timeout: 5000,
        headers: { "X-Request-ID": req.id },
      },
    );
    return response.data;
  },
  {
    timeout: 5000,
    errorThresholdPercentage: 50,
    resetTimeout: 30000,
  },
);

productCircuit.on("fallback", () => ({
  error: "Service temporarily unavailable",
}));
productCircuit.on("timeout", () => ({ error: "Service timeout" }));

app.get("/api/products/:id", authenticate, async (req, res) => {
  try {
    const product = await productCircuit.fire(req.params.id);
    res.json(product);
  } catch (error) {
    res.status(502).json({ error: "Bad gateway", requestId: req.id });
  }
});

// Route: Create order (aggregates product and order services)
app.post(
  "/api/orders",
  authenticate,
  authorize("user", "admin"),
  async (req, res) => {
    const { productId, quantity } = req.body;

    try {
      // Check product availability
      const productResponse = await axios.get(
        `${PRODUCT_SERVICE_URL}/products/${productId}`,
        { timeout: 3000 },
      );

      if (!productResponse.data.available) {
        return res.status(400).json({ error: "Product not available" });
      }

      // Create order
      const orderResponse = await axios.post(
        `${ORDER_SERVICE_URL}/orders`,
        { productId, quantity, userId: req.user.id },
        { timeout: 5000 },
      );

      res.status(201).json(orderResponse.data);
    } catch (error) {
      if (error.code === "ECONNABORTED") {
        return res.status(504).json({ error: "Gateway timeout" });
      }
      res.status(502).json({ error: "Failed to create order" });
    }
  },
);

// Error handling middleware
app.use((err, req, res, next) => {
  console.error(`[${req.id}] Unhandled error:`, err);
  res.status(500).json({ error: "Internal server error", requestId: req.id });
});

app.listen(PORT, () => {
  console.log(`API Gateway listening on port ${PORT}`);
});

Docker Compose for Local Development

version: "3.8"

services:
  api-gateway:
    build: ./api-gateway
    ports:
      - "3000:3000"
    environment:
      - JWT_SECRET=your-secret-key
      - AUTH_SERVICE_URL=http://auth-service:8080
      - PRODUCT_SERVICE_URL=http://product-service:8080
      - ORDER_SERVICE_URL=http://order-service:8080
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

  auth-service:
    image: your-auth-service-image
    ports:
      - "8080:8080"

  product-service:
    image: your-product-service-image
    ports:
      - "8081:8080"

  order-service:
    image: your-order-service-image
    ports:
      - "8082:8080"

Capacity Estimation

Assumptions

  • Average request size: 2 KB
  • Average response size: 16 KB
  • Peak QPS: 10,000 requests/second
  • Average response time target: 50ms (gateway overhead: 3ms)

Gateway Instance Calculation

Required instances = (Peak QPS × Avg Latency × Safety Factor) / (Max Throughput per Instance)

Where:
- Peak QPS = 10,000
- Avg Latency = 50ms (0.05s)
- Safety Factor = 2x
- Max Throughput per Instance = 2,000 QPS (typical for 2 vCPU instance)

Required = (10,000 × 0.05 × 2) / 2,000 = 0.5 instances → 2 instances (minimum for HA)

Network Bandwidth

Inbound:  10,000 QPS × 2 KB = 20 MB/s = 160 Mbps
Outbound: 10,000 QPS × 16 KB = 160 MB/s = 1.28 Gbps

Total network required: ~1.5 Gbps

Memory (per instance with 2 vCPU)

Connection buffers: 256 MB
Rate limiting state (Redis): Shared across instances
Application heap: 512 MB
Operating system: 256 MB

Total per instance: ~1 GB RAM

Observability Checklist

Metrics to Capture

  • gateway_requests_total (counter) - Total requests by route, status code
  • gateway_request_duration_seconds (histogram) - Latency by route, percentile bands
  • gateway_active_connections (gauge) - Current concurrent connections
  • gateway_rate_limit_exceeded_total (counter) - Rate limit violations by client
  • gateway_backend_errors_total (counter) - Backend service errors by service
  • gateway_circuit_breaker_state (gauge) - Circuit breaker state by backend

Logs to Emit

Each request should emit structured JSON logs:

{
  "timestamp": "2026-03-23T10:15:30.123Z",
  "requestId": "550e8400-e29b-41d4-a716-446655440000",
  "method": "GET",
  "path": "/api/products/123",
  "statusCode": 200,
  "latencyMs": 12,
  "clientIp": "203.0.113.42",
  "userAgent": "MobileApp/2.1",
  "userId": "usr_abc123",
  "rateLimitRemaining": 87,
  "backendService": "product-service",
  "backendLatencyMs": 8
}

Alerts to Configure

AlertThresholdSeverity
P99 latency > 100ms100ms for 5 minutesWarning
P99 latency > 500ms500ms for 1 minuteCritical
Error rate > 1%1% for 5 minutesWarning
Error rate > 5%5% for 1 minuteCritical
Rate limit violations spike> 1000/min from single IPWarning
Backend service unavailableAny backend down > 30sCritical
Certificate expiry < 30 daysAny cert expiring soonWarning

Distributed Tracing

Ensure the gateway propagates trace context to backend services:

// Propagate trace headers to backend services
const traceHeaders = {
  "X-Request-ID": req.id,
  "X-B3-TraceId": req.headers["x-b3-traceid"],
  "X-B3-SpanId": req.headers["x-b3-spanid"],
  "X-B3-Sampled": req.headers["x-b3-sampled"],
};

await axios.get(`${SERVICE_URL}/products/${id}`, {
  headers: { ...traceHeaders, Authorization: req.headers.authorization },
});

Security Checklist

  • TLS 1.2+ termination with modern cipher suites
  • JWT validation with proper signature verification
  • Rate limiting configured per-client (IP, API key, user ID)
  • Request size limits to prevent payload amplification
  • Input validation on all request parameters
  • Output encoding to prevent XSS in responses
  • CORS policy properly configured
  • Security headers (HSTS, CSP, X-Frame-Options)
  • Audit logging for all authentication/authorization failures
  • API key rotation mechanism
  • Deprecation notices for older API versions
  • Penetration testing performed annually
  • DDoS protection at edge (Cloudflare, AWS Shield)
  • Backend services unreachable directly (only via gateway)

Common Pitfalls / Anti-Patterns

Pitfall 1: Gateway as a Monolith Proxy

Problem: Teams sometimes build the gateway to contain significant business logic, transforming it into another monolith that mirrors the old system.

Solution: Keep the gateway thin. It should handle cross-cutting concerns (auth, routing, rate limiting) but delegate business logic to the appropriate backend services. If you find yourself writing if (user.plan === 'enterprise') { ... } in the gateway, that is a sign business logic is leaking into the gateway.

Pitfall 2: No Circuit Breaker on Backend Calls

Problem: A slow or failing backend service causes requests to pile up at the gateway, eventually exhausting gateway resources and taking down the entire system.

Solution: Always wrap backend service calls with circuit breakers. When a backend error rate exceeds a threshold, the circuit opens and immediately returns an error rather than waiting for timeouts.

// Never do this - no timeout, no circuit breaker
const response = await axios.get(`${BACKEND_URL}/data`);

// Always do this
const circuit = new CircuitBreaker(axios.get, {
  timeout: 3000,
  errorThresholdPercentage: 50,
});
const response = await circuit.fire(BACKEND_URL);

Pitfall 3: Stale Service Discovery

Problem: The gateway caches service endpoints that have changed (scale-down, failures), causing requests to go to dead instances.

Solution: Use short TTLs in service discovery (30 seconds or less), implement health checks that remove unhealthy instances immediately, and have backend services register/deregister dynamically with the service registry.

Pitfall 4: Authentication Bypass via Direct Service Access

Problem: Backend services are accessible directly without going through the gateway, bypassing all authentication and rate limiting.

Solution: Use network-level isolation (private VPC/subnet) so backend services are only reachable via the gateway. Never expose backend service ports to the public internet.

Pitfall 5: Rate Limiting Without Global State

Problem: Running multiple gateway instances with local (in-memory) rate limiting allows clients to get max_requests × instance_count by spreading requests across instances.

Solution: Use a shared rate limiting store (Redis) that all gateway instances consult. This ensures consistent enforcement regardless of which instance handles the request.


Quick Recap

  • An API gateway provides a single entry point for all client requests, handling auth, routing, rate limiting, and protocol translation.
  • Use API gateways when you have multiple services, diverse clients, or need centralized security policy enforcement.
  • Avoid gateways when latency is critical, for simple single-service applications, or when the overhead outweighs benefits.
  • Always implement circuit breakers, proper timeouts, and health checks when calling backend services.
  • Run multiple gateway instances behind a load balancer to avoid single points of failure.
  • Log structured data (request ID, latency, status) for debugging; emit metrics for alerting.

Copy/Paste Checklist

- [ ] Gateway deployed with TLS termination
- [ ] Multiple instances behind load balancer (minimum 2 for HA)
- [ ] JWT validation implemented
- [ ] Rate limiting configured with Redis backend
- [ ] Circuit breakers on all backend service calls
- [ ] Request/response logging with correlation IDs
- [ ] Metrics exported (latency, errors, rate limits)
- [ ] Alerts configured for latency and error thresholds
- [ ] Backend services only reachable via gateway (network isolation)
- [ ] Security headers configured (HSTS, CSP, etc.)
- [ ] Health check endpoint at /health
- [ ] Graceful shutdown configured
- [ ] DDoS protection at edge
- [ ] Certificate renewal automated

See Also

Category

Related Posts

Microservices vs Monolith: Choosing the Right Architecture

Understand the fundamental differences between monolithic and microservices architectures, their trade-offs, and how to decide which approach fits your project.

#microservices #monolith #architecture

Server-Side Discovery: Load Balancer-Based Service Routing

Learn how server-side discovery uses load balancers and reverse proxies to route service requests in microservices architectures.

#microservices #server-side-discovery #load-balancing

Load Balancing Algorithms: Round Robin, Least Connections, and Beyond

Explore load balancing algorithms used in microservices including round robin, least connections, weighted, IP hash, and adaptive algorithms.

#microservices #load-balancing #algorithms