API Gateway: The Single Entry Point for Microservices Architecture
Learn how API gateways work, when to use them, architecture patterns, failure scenarios, and implementation strategies for production microservices.
API Gateway: The Single Entry Point for Microservices Architecture
An API gateway sits at the entrance of your backend services and handles everything that would otherwise clutter individual services or repeat across them: request routing, authentication, rate limiting, protocol translation. When a mobile app asks your product catalog for data, the gateway receives that request, checks the user’s session token, applies rate limiting rules, and forwards it to the product service—all in a single hop.
This centralized layer makes client code simpler since applications talk to one endpoint instead of keeping track of multiple service addresses. It also gives you a natural chokepoint for enforcing security policies, merging responses from different services, and gathering metrics about how your API gets used. For teams building microservices at scale, an API gateway is not optional—it is foundational infrastructure.
When to Use / When Not to Use
| Scenario | Recommendation |
|---|---|
| Multiple backend services need unified access control | Use API Gateway |
| Mobile, web, and third-party clients consume the same APIs | Use API Gateway |
| You need centralized rate limiting and throttling | Use API Gateway |
| Service aggregation is required for client convenience | Use API Gateway |
| Single monolithic application with no external clients | Do NOT use API Gateway |
| Services are tightly coupled and share a deployment unit | Do NOT use API Gateway |
| Ultra-low latency is critical (gateway adds ~1-3ms) | Consider alternatives |
| Simple CRUD application with one or two services | Consider direct service calls |
When TO Use an API Gateway
- Unified client access: Your mobile app, web app, and third-party integrations all hit different services. Without a gateway, clients need to know about every service endpoint, certificate, and authentication mechanism.
- Shared authentication and authorization: You want a single place to validate JWTs, check permissions, and reject unauthorized requests before they reach your services.
- Rate limiting at the edge: You need to protect your services from traffic spikes, abusive clients, or accidental misconfiguration without adding this logic to every service.
- Protocol translation: Your mobile clients use REST, but your internal services might use gRPC or WebSocket. The gateway translates between them.
- Request aggregation: A mobile screen needs data from three different services. Without aggregation in the gateway, the client makes three separate calls with associated latency and complexity.
When NOT to Use an API Gateway
- Adding unnecessary hops: If your system is a simple monolith or a handful of tightly coordinated services, the gateway introduces latency without meaningful benefit.
- Bypassing for internal services: In some architectures, internal services behind the gateway still need to call each other directly. The gateway becomes a bottleneck rather than a helper.
- Single-purpose applications: A data processing pipeline with no external clients does not need a gateway.
- Latency-sensitive paths: Every request going through the gateway adds 1-3ms. For extremely latency-sensitive use cases, this matters.
Architecture Diagram
Request Flow
sequenceDiagram
participant Client
participant Gateway as API Gateway
participant Auth as Auth Service
participant Catalog as Product Service
participant Order as Order Service
participant Cache as Redis Cache
Client->>Gateway: POST /api/products/123
Gateway->>Gateway: Extract JWT, Rate Limit Check
Gateway->>Auth: Validate Token
Auth-->>Gateway: Token Valid
Gateway->>Cache: Check Cache
Cache-->>Gateway: Cache Hit
Gateway-->>Client: Product JSON
Client->>Gateway: POST /api/orders
Gateway->>Gateway: Extract JWT, Rate Limit Check
Gateway->>Auth: Validate Token
Auth-->>Gateway: Token Valid
Gateway->>Catalog: Check Product Availability
Catalog-->>Gateway: Available
Gateway->>Order: Create Order
Order-->>Gateway: Order Created
Gateway-->>Client: Order Confirmation
Gateway Internal Components
graph TD
A[Client Request] --> B[TLS Termination]
B --> C[Authentication]
C --> D[Authorization]
D --> E[Rate Limiting]
E --> F[Request Routing]
F --> G[Service Discovery]
G --> H[Backend Service]
H --> F
F --> I[Response Aggregation]
I --> J[Metrics Collection]
J --> K[Client Response]
subgraph Security Layer
C
D
E
end
subgraph Routing Layer
F
G
end
Failure Flow
graph TD
A[Client Request] --> B{Gateway Available?}
B -->|No| C[Return 503 Service Unavailable]
B -->|Yes| D{Auth Passed?}
D -->|No| E[Return 401 Unauthorized]
D -->|Yes| F{Rate Limit OK?}
F -->|No| G[Return 429 Too Many Requests]
F -->|Yes| H{Backend Service Available?}
H -->|No| I[Return 502 Bad Gateway]
H -->|Yes| J{Request Valid?}
J -->|No| K[Return 400 Bad Request]
J -->|Yes| L[Forward to Service]
L --> M{Service Timeout?}
M -->|Yes| N[Return 504 Gateway Timeout]
M -->|No| O[Return Service Response]
Production Failure Scenarios
| Failure Scenario | Impact | Mitigation |
|---|---|---|
| Gateway instance crash | All traffic fails | Run multiple gateway instances behind load balancer; health checks detect failures |
| Backend service timeout | Client hangs indefinitely | Set aggressive timeouts (e.g., 5s); circuit breaker returns error immediately |
| Auth service unavailable | No requests can be validated | Cache JWT validation results with short TTL; allow requests if auth is slow |
| Rate limiter memory exhaustion | Rate limiting fails open | Use Redis-backed rate limiting; set hard limits on memory per tenant |
| Gateway misconfiguration | All traffic routing incorrectly | Use version-controlled config; canary deployments for config changes |
| SSL/TLS certificate expiry | HTTPS requests fail | Automate certificate renewal (Let’s Encrypt); alert 30 days before expiry |
| Service discovery returns stale IPs | Requests go to dead instances | Use short TTL in service registry; health checks remove unhealthy instances |
| Request payload too large | Memory exhaustion on gateway | Set max request size limits; reject oversized payloads early |
Trade-off Analysis
| Factor | With API Gateway | Without API Gateway |
|---|---|---|
| Latency | +1-3ms per request | Baseline |
| Consistency | Centralized auth/rate limiting | Duplicated per service |
| Cost | Gateway instances + operation | No additional cost |
| Complexity | Centralized logic, single config | Distributed logic, multiple configs |
| Operability | Single point to monitor | Monitor each service separately |
| Client complexity | Low (one endpoint) | High (manage multiple endpoints) |
| Debugging | Single point to trace | Trace across multiple services |
| Single point of failure | Yes, unless highly available | No (but more complex clients) |
| Flexibility | Limited by gateway capabilities | Full flexibility per service |
API Gateway vs Service Mesh
Gateway vs Service Mesh
| Aspect | API Gateway | Service Mesh |
|---|---|---|
| Layer | L7 (Application) | L4/L7 (Transport + Application) |
| Scope | North-South traffic (client to service) | East-West traffic (service to service) |
| Typical Users | Platform teams, API product teams | DevOps, SRE teams |
| Features | Auth, routing, aggregation, protocol translation | mTLS, retries, circuit breaking |
| Deployment | Sits at edge | Sidecar proxies on each service |
For most architectures, you need both. The API gateway handles external client traffic while a service mesh handles internal service-to-service communication. See Service Mesh for a deep dive.
Implementation Example (Node.js)
Here is a minimal but production-ready API gateway implementation using Express:
const express = require("express");
const axios = require("axios");
const rateLimit = require("express-rate-limit");
const jwt = require("jsonwebtoken");
const app = express();
// Configuration
const PORT = process.env.PORT || 3000;
const AUTH_SERVICE_URL =
process.env.AUTH_SERVICE_URL || "http://auth-service:8080";
const PRODUCT_SERVICE_URL =
process.env.PRODUCT_SERVICE_URL || "http://product-service:8080";
const ORDER_SERVICE_URL =
process.env.ORDER_SERVICE_URL || "http://order-service:8080";
// Middleware: Parse JSON with size limit
app.use(express.json({ limit: "1mb" }));
// Middleware: Request ID for tracing
app.use((req, res, next) => {
req.id = crypto.randomUUID();
res.setHeader("X-Request-ID", req.id);
next();
});
// Middleware: Rate limiting (Redis-backed in production)
const limiter = rateLimit({
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per minute per IP
message: { error: "Too many requests", requestId: (req) => req.id },
standardHeaders: true,
legacyHeaders: false,
});
app.use("/api/", limiter);
// Middleware: Authentication
async function authenticate(req, res, next) {
const token = req.headers.authorization?.replace("Bearer ", "");
if (!token) {
return res
.status(401)
.json({ error: "Missing authorization token", requestId: req.id });
}
try {
// In production, use a distributed cache for validation results
const decoded = jwt.verify(token, process.env.JWT_SECRET);
req.user = decoded;
next();
} catch (error) {
return res.status(401).json({ error: "Invalid token", requestId: req.id });
}
}
// Middleware: Authorization
function authorize(...allowedRoles) {
return (req, res, next) => {
if (!req.user || !allowedRoles.includes(req.user.role)) {
return res
.status(403)
.json({ error: "Insufficient permissions", requestId: req.id });
}
next();
};
}
// Health check endpoint
app.get("/health", (req, res) => {
res.json({ status: "healthy", timestamp: new Date().toISOString() });
});
// Route: Product catalog with circuit breaker
const { CircuitBreaker } = require("opossum");
const productCircuit = new CircuitBreaker(
async (productId) => {
const response = await axios.get(
`${PRODUCT_SERVICE_URL}/products/${productId}`,
{
timeout: 5000,
headers: { "X-Request-ID": req.id },
},
);
return response.data;
},
{
timeout: 5000,
errorThresholdPercentage: 50,
resetTimeout: 30000,
},
);
productCircuit.on("fallback", () => ({
error: "Service temporarily unavailable",
}));
productCircuit.on("timeout", () => ({ error: "Service timeout" }));
app.get("/api/products/:id", authenticate, async (req, res) => {
try {
const product = await productCircuit.fire(req.params.id);
res.json(product);
} catch (error) {
res.status(502).json({ error: "Bad gateway", requestId: req.id });
}
});
// Route: Create order (aggregates product and order services)
app.post(
"/api/orders",
authenticate,
authorize("user", "admin"),
async (req, res) => {
const { productId, quantity } = req.body;
try {
// Check product availability
const productResponse = await axios.get(
`${PRODUCT_SERVICE_URL}/products/${productId}`,
{ timeout: 3000 },
);
if (!productResponse.data.available) {
return res.status(400).json({ error: "Product not available" });
}
// Create order
const orderResponse = await axios.post(
`${ORDER_SERVICE_URL}/orders`,
{ productId, quantity, userId: req.user.id },
{ timeout: 5000 },
);
res.status(201).json(orderResponse.data);
} catch (error) {
if (error.code === "ECONNABORTED") {
return res.status(504).json({ error: "Gateway timeout" });
}
res.status(502).json({ error: "Failed to create order" });
}
},
);
// Error handling middleware
app.use((err, req, res, next) => {
console.error(`[${req.id}] Unhandled error:`, err);
res.status(500).json({ error: "Internal server error", requestId: req.id });
});
app.listen(PORT, () => {
console.log(`API Gateway listening on port ${PORT}`);
});
Docker Compose for Local Development
version: "3.8"
services:
api-gateway:
build: ./api-gateway
ports:
- "3000:3000"
environment:
- JWT_SECRET=your-secret-key
- AUTH_SERVICE_URL=http://auth-service:8080
- PRODUCT_SERVICE_URL=http://product-service:8080
- ORDER_SERVICE_URL=http://order-service:8080
- REDIS_URL=redis://redis:6379
depends_on:
- redis
redis:
image: redis:7-alpine
ports:
- "6379:6379"
auth-service:
image: your-auth-service-image
ports:
- "8080:8080"
product-service:
image: your-product-service-image
ports:
- "8081:8080"
order-service:
image: your-order-service-image
ports:
- "8082:8080"
Capacity Estimation
Assumptions
- Average request size: 2 KB
- Average response size: 16 KB
- Peak QPS: 10,000 requests/second
- Average response time target: 50ms (gateway overhead: 3ms)
Gateway Instance Calculation
Required instances = (Peak QPS × Avg Latency × Safety Factor) / (Max Throughput per Instance)
Where:
- Peak QPS = 10,000
- Avg Latency = 50ms (0.05s)
- Safety Factor = 2x
- Max Throughput per Instance = 2,000 QPS (typical for 2 vCPU instance)
Required = (10,000 × 0.05 × 2) / 2,000 = 0.5 instances → 2 instances (minimum for HA)
Network Bandwidth
Inbound: 10,000 QPS × 2 KB = 20 MB/s = 160 Mbps
Outbound: 10,000 QPS × 16 KB = 160 MB/s = 1.28 Gbps
Total network required: ~1.5 Gbps
Memory (per instance with 2 vCPU)
Connection buffers: 256 MB
Rate limiting state (Redis): Shared across instances
Application heap: 512 MB
Operating system: 256 MB
Total per instance: ~1 GB RAM
Observability Checklist
Metrics to Capture
gateway_requests_total(counter) - Total requests by route, status codegateway_request_duration_seconds(histogram) - Latency by route, percentile bandsgateway_active_connections(gauge) - Current concurrent connectionsgateway_rate_limit_exceeded_total(counter) - Rate limit violations by clientgateway_backend_errors_total(counter) - Backend service errors by servicegateway_circuit_breaker_state(gauge) - Circuit breaker state by backend
Logs to Emit
Each request should emit structured JSON logs:
{
"timestamp": "2026-03-23T10:15:30.123Z",
"requestId": "550e8400-e29b-41d4-a716-446655440000",
"method": "GET",
"path": "/api/products/123",
"statusCode": 200,
"latencyMs": 12,
"clientIp": "203.0.113.42",
"userAgent": "MobileApp/2.1",
"userId": "usr_abc123",
"rateLimitRemaining": 87,
"backendService": "product-service",
"backendLatencyMs": 8
}
Alerts to Configure
| Alert | Threshold | Severity |
|---|---|---|
| P99 latency > 100ms | 100ms for 5 minutes | Warning |
| P99 latency > 500ms | 500ms for 1 minute | Critical |
| Error rate > 1% | 1% for 5 minutes | Warning |
| Error rate > 5% | 5% for 1 minute | Critical |
| Rate limit violations spike | > 1000/min from single IP | Warning |
| Backend service unavailable | Any backend down > 30s | Critical |
| Certificate expiry < 30 days | Any cert expiring soon | Warning |
Distributed Tracing
Ensure the gateway propagates trace context to backend services:
// Propagate trace headers to backend services
const traceHeaders = {
"X-Request-ID": req.id,
"X-B3-TraceId": req.headers["x-b3-traceid"],
"X-B3-SpanId": req.headers["x-b3-spanid"],
"X-B3-Sampled": req.headers["x-b3-sampled"],
};
await axios.get(`${SERVICE_URL}/products/${id}`, {
headers: { ...traceHeaders, Authorization: req.headers.authorization },
});
Security Checklist
- TLS 1.2+ termination with modern cipher suites
- JWT validation with proper signature verification
- Rate limiting configured per-client (IP, API key, user ID)
- Request size limits to prevent payload amplification
- Input validation on all request parameters
- Output encoding to prevent XSS in responses
- CORS policy properly configured
- Security headers (HSTS, CSP, X-Frame-Options)
- Audit logging for all authentication/authorization failures
- API key rotation mechanism
- Deprecation notices for older API versions
- Penetration testing performed annually
- DDoS protection at edge (Cloudflare, AWS Shield)
- Backend services unreachable directly (only via gateway)
Common Pitfalls / Anti-Patterns
Pitfall 1: Gateway as a Monolith Proxy
Problem: Teams sometimes build the gateway to contain significant business logic, transforming it into another monolith that mirrors the old system.
Solution: Keep the gateway thin. It should handle cross-cutting concerns (auth, routing, rate limiting) but delegate business logic to the appropriate backend services. If you find yourself writing if (user.plan === 'enterprise') { ... } in the gateway, that is a sign business logic is leaking into the gateway.
Pitfall 2: No Circuit Breaker on Backend Calls
Problem: A slow or failing backend service causes requests to pile up at the gateway, eventually exhausting gateway resources and taking down the entire system.
Solution: Always wrap backend service calls with circuit breakers. When a backend error rate exceeds a threshold, the circuit opens and immediately returns an error rather than waiting for timeouts.
// Never do this - no timeout, no circuit breaker
const response = await axios.get(`${BACKEND_URL}/data`);
// Always do this
const circuit = new CircuitBreaker(axios.get, {
timeout: 3000,
errorThresholdPercentage: 50,
});
const response = await circuit.fire(BACKEND_URL);
Pitfall 3: Stale Service Discovery
Problem: The gateway caches service endpoints that have changed (scale-down, failures), causing requests to go to dead instances.
Solution: Use short TTLs in service discovery (30 seconds or less), implement health checks that remove unhealthy instances immediately, and have backend services register/deregister dynamically with the service registry.
Pitfall 4: Authentication Bypass via Direct Service Access
Problem: Backend services are accessible directly without going through the gateway, bypassing all authentication and rate limiting.
Solution: Use network-level isolation (private VPC/subnet) so backend services are only reachable via the gateway. Never expose backend service ports to the public internet.
Pitfall 5: Rate Limiting Without Global State
Problem: Running multiple gateway instances with local (in-memory) rate limiting allows clients to get max_requests × instance_count by spreading requests across instances.
Solution: Use a shared rate limiting store (Redis) that all gateway instances consult. This ensures consistent enforcement regardless of which instance handles the request.
Quick Recap
- An API gateway provides a single entry point for all client requests, handling auth, routing, rate limiting, and protocol translation.
- Use API gateways when you have multiple services, diverse clients, or need centralized security policy enforcement.
- Avoid gateways when latency is critical, for simple single-service applications, or when the overhead outweighs benefits.
- Always implement circuit breakers, proper timeouts, and health checks when calling backend services.
- Run multiple gateway instances behind a load balancer to avoid single points of failure.
- Log structured data (request ID, latency, status) for debugging; emit metrics for alerting.
Copy/Paste Checklist
- [ ] Gateway deployed with TLS termination
- [ ] Multiple instances behind load balancer (minimum 2 for HA)
- [ ] JWT validation implemented
- [ ] Rate limiting configured with Redis backend
- [ ] Circuit breakers on all backend service calls
- [ ] Request/response logging with correlation IDs
- [ ] Metrics exported (latency, errors, rate limits)
- [ ] Alerts configured for latency and error thresholds
- [ ] Backend services only reachable via gateway (network isolation)
- [ ] Security headers configured (HSTS, CSP, etc.)
- [ ] Health check endpoint at /health
- [ ] Graceful shutdown configured
- [ ] DDoS protection at edge
- [ ] Certificate renewal automated
See Also
- Service Mesh — Managing internal service-to-service communication
- Load Balancing — Distributing traffic across multiple gateway instances
- RESTful API Design — Best practices for API contract design
- Circuit Breaker Pattern — Preventing cascade failures
- System Design Roadmap — Complete learning path for system design
Category
Related Posts
Microservices vs Monolith: Choosing the Right Architecture
Understand the fundamental differences between monolithic and microservices architectures, their trade-offs, and how to decide which approach fits your project.
Server-Side Discovery: Load Balancer-Based Service Routing
Learn how server-side discovery uses load balancers and reverse proxies to route service requests in microservices architectures.
Load Balancing Algorithms: Round Robin, Least Connections, and Beyond
Explore load balancing algorithms used in microservices including round robin, least connections, weighted, IP hash, and adaptive algorithms.