Synchronous Communication: REST, gRPC, and When to Use Each
Explore synchronous communication patterns in microservices including REST APIs, gRPC, when to use each protocol, and their trade-offs.
Synchronous Communication: REST, gRPC, and When to Use Each
Microservices do not automatically solve your problems. They just move the difficulty around. One of the first design decisions you face is how services communicate. Synchronous communication is the most straightforward approach: a client sends a request, waits for the reply, then continues. Here I will explore the two main protocols for synchronous communication, where each one makes sense, and how to avoid turning a simple request into a cascade failure.
What is Synchronous Communication in Microservices?
The idea is straightforward. Service A calls Service B. Service A stops and does nothing until Service B responds. Only then does Service A resume.
This is the request-response model that software has used for decades before anyone used the word “microservice.” The appeal is predictability. You know immediately whether an operation succeeded or failed. Your business logic stays linear and easy to follow.
The problem is coupling. Service A depends on Service B being available and responsive. If Service B slows down, Service A waits. If Service B fails, Service A fails. This tight coupling is why synchronous systems fail so spectacularly when things go wrong.
Synchronous communication works well when operations are fast, services are reliable, and you need immediate consistency. The moment any of those assumptions break, you start dealing with timeouts, retries, and cascading errors.
REST over HTTP: When to Use It
REST is nearly everywhere. HTTP is not going anywhere, and every tool, language, and framework speaks it. You can test a REST endpoint with curl in your terminal. The format is human-readable JSON. No code generation required.
This ubiquity is REST’s main advantage. For public APIs consumed by third-party developers, it is the obvious choice. An external team can integrate without installing special tooling or learning a new protocol. The documentation writes itself because REST endpoints map naturally to resource-based URL structures.
Browser-based clients work naturally with REST. Browsers understand HTTP methods and status codes without help. You do not need a proxy layer or special configuration. This changes if you ever need to call services from browser JavaScript directly.
REST is also the better choice when schema flexibility matters. JSON accepts extra fields without breaking. You can evolve your API gradually without forcing all clients to update simultaneously. This matters in organizations where coordinating contract changes across teams takes time.
The tradeoff is that REST offers no compile-time checking. Rename a JSON field and you will not know something broke until runtime—probably in production, when a client sends the old field name and your code ignores it silently.
gRPC: When to Use It
Google built gRPC because REST left certain problems unsolved. It uses HTTP/2 for transport and Protocol Buffers for serialization. The combination handles higher throughput than JSON over HTTP/1.1 and enables patterns REST cannot support.
For internal services you control end-to-end, gRPC shines. You define your API contract once in a Protocol Buffer file. Code generation produces client libraries for every language your services use. When you change a field name, every consumer fails at compile time rather than runtime. Your CI pipeline catches breaking changes before they reach production.
Protocol Buffers produce smaller messages than JSON. HTTP/2 multiplexes multiple requests over a single connection, eliminating the head-of-line blocking that HTTP/1.1 suffers. For high-throughput services handling millions of requests per second, these optimizations add up.
Bi-directional streaming is gRPC’s most powerful feature. A client can send a stream of requests while receiving a stream of responses. This works for real-time data pipelines, collaborative editing, monitoring systems that push continuous updates. REST has no equivalent without workarounds like WebSockets or server-sent events.
The catch is browser support. You cannot call gRPC from browser JavaScript directly. You need a proxy like grpc-web that translates between the browser’s limited HTTP semantics and gRPC’s full capabilities. This is a real constraint if browser clients are part of your architecture.
Comparing REST and gRPC
The two approaches make different trade-offs. Here is how they compare on the factors that matter:
| Aspect | REST | gRPC |
|---|---|---|
| Serialization | JSON (human-readable) | Protocol Buffers (binary, compact) |
| Transport | HTTP/1.1 or HTTP/2 | HTTP/2 only |
| Schema Enforcement | None | Strong (generated code) |
| Browser Support | Native | Limited (needs grpc-web proxy) |
| Streaming | Server-sent events, polling | Native bi-directional streaming |
| Tooling | Universal | Requires code generation |
| Debugging | Plain text in logs | Binary needs decoders |
| Contract Evolution | Flexible but risky | Versioned schemas |
REST wins on debugging. You can paste a request into your terminal and read the response directly. gRPC payloads are binary. You need tooling to decode them. Early in development when you are iterating quickly, this matters.
gRPC wins on safety. Schema enforcement catches entire classes of bugs that JSON allows. When your CI pipeline fails because someone renamed a Protocol Buffer field, that is a feature. Runtime surprises are harder to debug than compile-time failures.
Browser clients tip the balance toward REST. If you need to call services from web browsers, gRPC requires additional infrastructure. Plan for this constraint early.
When to Use / When Not to Use Synchronous Communication
Trade-off Table
| Scenario | Use Synchronous | Use Asynchronous Instead |
|---|---|---|
| Need immediate consistency | REST or gRPC | Message queues, events |
| Operations complete in < 100ms | REST or gRPC | Consider async overhead |
| Long-running operations (seconds+) | Avoid sync | Webhooks, callbacks, polling |
| Multiple services in a chain | Add timeouts, circuit breakers | Decompose or use async |
| Fault isolation required | Avoid deep chains | Fire-and-forget events |
| High availability requirement | Add resilience patterns | Inherently more available |
| Cross-service transactions | Avoid, use sagas | Use saga pattern |
When to Use REST
Use REST when:
- Building public APIs consumed by external developers
- Browser-based clients need direct service access
- JSON schema flexibility is needed for API evolution
- Human readability matters for debugging
- Rapid prototyping and iteration are priorities
- Team lacks experience with code generation tools
Avoid REST when:
- You need bi-directional streaming
- Compile-time type safety is critical
- Message size and performance are paramount
- Internal services with shared contracts benefit from schema enforcement
When to Use gRPC
Use gRPC when:
- Internal service-to-service communication you control
- High throughput and low latency are requirements
- Bi-directional streaming is needed (real-time pipelines, collaborative editing)
- You want compile-time contract enforcement
- Multiple languages need consistent client libraries
Avoid gRPC when:
- Browser clients need to call services directly
- Human debugging in transit is important
- JSON-based legacy integration is required
- Team lacks familiarity with Protocol Buffers
Request-Response Patterns
Synchronous calls follow patterns that determine how your services interact.
Point-to-point is the simplest case. One service calls another, waits, and continues. Fast operations that do not involve multiple services work well with this pattern.
Chained requests span multiple services in sequence. Service A calls Service B, which calls Service C. Latency accumulates across each hop. If any service slows down, the entire chain slows. If any service fails, the failure propagates back up. Deep call chains are fragile.
Scatter-gather fans out to multiple services simultaneously. A request goes to Service B, C, and D at the same time. The caller waits for all responses. This reduces total latency compared to chaining, but requires more infrastructure to manage the fan-out and handle partial failures.
Understanding these patterns helps you design APIs that match your reliability requirements. Not every operation belongs in a long chain.
Synchronous Failure Flow
Synchronous systems fail in predictable but dangerous ways. Here is what happens when Service C experiences latency.
sequenceDiagram
participant C as Client
participant SA as Service A
participant SB as Service B
participant SC as Service C
participant DB as Database
C->>SA: GET /order/123
SA->>SB: Verify customer
SB->>SC: Check credit limit
SC-->>SB: (slow) Waiting...
SB-->>SA: Timeout after 5s
SA-->>C: 504 Gateway Timeout
Note over SC: Service C recovers
SC-->>SB: Credit OK
SB-->>SA: Customer OK
SA-->>C: (retry) GET /order/123
SA->>DB: Fetch order
DB-->>SA: Order data
SA-->>C: 200 OK
In this cascade failure, Service C slows down and causes timeouts all the way back to the client. The client eventually retries and succeeds, but only after experiencing a failure.
Circuit Breaker Failure Flow
Circuit breakers prevent cascade failures by failing fast when a downstream service is unhealthy.
stateDiagram-v2
[*] --> Closed: Normal operation
Closed --> Open: Failure threshold exceeded
Open --> HalfOpen: Timeout expired
HalfOpen --> Closed: Probe succeeds
HalfOpen --> Open: Probe fails
state Closed {
[*] --> Normal
Normal --> HighLatency: Slow responses
HighLatency --> Normal: Latency recovers
HighLatency --> Failing: Failure threshold
Failing --> Normal: Recovery succeeds
}
Circuit breakers wrap synchronous calls and monitor failure rates. When failures exceed a threshold, the circuit opens and calls fail immediately without hitting the unhealthy service. After a cooldown period, a probe call tests whether the service has recovered.
Timeouts and Retry Considerations
Networks fail. Services crash. Load spikes cause timeouts. Your synchronous code must handle these cases explicitly.
Every synchronous call needs a timeout. Without one, a slow service blocks your service indefinitely. Setting timeouts requires knowing your SLAs and typical response times. Too short and you fail冤枉ly. Too long and you defeat the purpose of failing fast.
Start conservative and adjust based on production data. Monitor your p99 response times. If p99 is 200ms, a 500ms timeout gives room for spikes without waiting forever on genuine failures.
Retries recover from transient failures, but they amplify problems if not handled carefully. Exponential backoff prevents overwhelming a struggling service. Circuit breakers stop retry storms when a service is genuinely down.
Retries are not free. They consume resources on both sides. They can turn one service’s problem into a system-wide outage. When you retry, the same request may execute multiple times. Idempotency is essential—the operation must produce the same result regardless of how many times it runs.
# Timeout and retry example
import httpx
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=1, max=10))
async def call_service_with_retry(url: str) -> dict:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(url)
response.raise_for_status()
return response.json()
Circuit Breaker Implementation
import time
import asyncio
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed"
OPEN = "open"
HALF_OPEN = "half_open"
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60, recovery_timeout=30):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.recovery_timeout = recovery_timeout
self.failure_count = 0
self.last_failure_time = None
self.state = CircuitState.CLOSED
async def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN")
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise e
def _on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def _on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
Point-to-Point Client Implementation
import httpx
from typing import Optional
class ServiceClient:
def __init__(self, base_url: str, timeout: float = 5.0, max_retries: int = 3):
self.base_url = base_url.rstrip("/")
self.timeout = timeout
self.max_retries = max_retries
async def get(self, path: str, params: Optional[dict] = None) -> dict:
url = f"{self.base_url}/{path.lstrip('/')}"
async with httpx.AsyncClient(timeout=self.timeout) as client:
for attempt in range(self.max_retries):
try:
response = await client.get(url, params=params)
response.raise_for_status()
return response.json()
except httpx.TimeoutException:
if attempt == self.max_retries - 1:
raise
except httpx.HTTPStatusError as e:
if e.response.status_code >= 500:
if attempt == self.max_retries - 1:
raise
else:
raise
async def post(self, path: str, json: Optional[dict] = None) -> dict:
url = f"{self.base_url}/{path.lstrip('/')}"
async with httpx.AsyncClient(timeout=self.timeout) as client:
response = await client.post(url, json=json)
response.raise_for_status()
return response.json()
When Synchronous is the Wrong Choice
Synchronous communication couples services by availability and latency. That coupling has costs.
High latency operations do not fit synchronous patterns. If an operation takes seconds to complete, blocking the caller is impractical. A user staring at a spinner for thirty seconds is a bad experience. Asynchronous patterns—initiate the operation, poll for completion, or receive a callback—work better for long-running operations.
Cascade failures spread when one service failure propagates to others. Service A calls Service B, which calls Service C. If Service C slows down, Service B waits. Service A times out. Users see errors across the system even though only one service has a problem. Without circuit breakers and bulkheads, synchronous systems amplify failures.
Distributed transactions across multiple services are notoriously difficult with synchronous calls. When an operation spans services and must be atomic, synchronous rollback is messy. Asynchronous saga patterns handle this better, though they introduce their own complexity.
Loose coupling sometimes matters more than simplicity. If services need to evolve independently, adding a message broker decouples release cycles. Service A does not need to know when Service B deploys a new version. Asynchronous events let services communicate without direct knowledge of each other.
Evaluate these factors before defaulting to synchronous communication. The simplicity of request-response has hidden costs in the right scenarios.
Synchronous Request Flow
Here is what a synchronous request looks like in practice.
sequenceDiagram
participant C as Client
participant G as API Gateway
participant S1 as Service A
participant S2 as Service B
participant DB as Database
C->>G: HTTP Request
G->>S1: Forward Request
S1->>DB: Query Data
DB-->>S1: Return Results
S1->>S2: Call Service B
S2-->>S1: Return Response
S1-->>G: HTTP Response
G-->>C: Return to Client
Each arrow in this diagram is a potential failure point and a source of latency. Monitoring helps identify where bottlenecks occur.
Conclusion
Synchronous communication is not obsolete. It is the right tool for problems where simplicity and immediate consistency matter more than loose coupling. REST remains the standard for external APIs and browser-facing services. gRPC delivers performance and type safety for internal service communication where you control both ends.
Most organizations end up using both. External-facing APIs in REST, internal gRPC for service-to-service calls. The key is matching the protocol to the constraints of each interaction.
Build resilience into synchronous systems from the start. Timeouts, retries, and circuit breakers are not optional add-ons. Without them, small failures become large outages.
Observability Hooks
Synchronous systems require explicit observability instrumentation. Unlike async systems where failures are queued, sync failures are immediate and visible.
Request Correlation
Every synchronous request should carry a correlation ID through the call chain.
import httpx
import uuid
from contextvars import ContextVar
correlation_id: ContextVar[str] = ContextVar("correlation_id", default="")
async def correlated_get(url: str, headers: dict = None) -> httpx.Response:
cid = correlation_id.get()
if not cid:
cid = str(uuid.uuid4())
correlation_id.set(cid)
request_headers = {**(headers or {}), "X-Correlation-ID": cid}
async with httpx.AsyncClient() as client:
return await client.get(url, headers=request_headers)
Key Metrics to Track
| Metric | Purpose | Alert Threshold |
|---|---|---|
| Request latency p50/p95/p99 | Baseline performance | p99 > SLA |
| Error rate by endpoint | Service health | > 1% for 5min |
| Timeout rate | Downstream health | > 10% |
| Circuit breaker state | Resilience activation | OPEN state |
| Retry rate | Transient failures | > 20% |
Logging Structured Data
import structlog
import time
logger = structlog.get_logger()
async def logged_service_call(service: str, operation: str, func, *args, **kwargs):
start = time.time()
correlation_id = correlation_id.get()
logger.info(
"service_call_started",
service=service,
operation=operation,
correlation_id=correlation_id
)
try:
result = await func(*args, **kwargs)
duration = time.time() - start
logger.info(
"service_call_completed",
service=service,
operation=operation,
duration_ms=int(duration * 1000),
correlation_id=correlation_id
)
return result
except Exception as e:
duration = time.time() - start
logger.error(
"service_call_failed",
service=service,
operation=operation,
duration_ms=int(duration * 1000),
error=str(e),
correlation_id=correlation_id
)
raise
Health Check Pattern
from fastapi import FastAPI
import httpx
app = FastAPI()
@app.get("/health")
async def health_check():
checks = {}
healthy = True
# Check downstream services
for service_name, service_url in downstream_services.items():
try:
async with httpx.AsyncClient(timeout=2.0) as client:
response = await client.get(f"{service_url}/health")
checks[service_name] = {"status": "up", "latency_ms": response.elapsed}
except Exception as e:
checks[service_name] = {"status": "down", "error": str(e)}
healthy = False
return {"status": "healthy" if healthy else "unhealthy", "checks": checks}
Quick Recap
graph LR
Client -->|HTTP Request| Gateway
Gateway -->|REST/gRPC| ServiceA
ServiceA -->|Sync Call| ServiceB
ServiceB -->|Response| ServiceA
ServiceA -->|Response| Gateway
Gateway -->|HTTP Response| Client
Key Points
- Synchronous communication provides immediate consistency but creates tight coupling
- REST offers human-readable debugging and universal browser support
- gRPC provides type safety, bi-directional streaming, and compile-time contract enforcement
- Always configure timeouts to prevent blocking on slow or failed services
- Circuit breakers prevent cascade failures from spreading through the call chain
- Retries amplify problems if not combined with idempotency and backoff
- Correlation IDs enable tracing requests through the entire call chain
When to Choose Synchronous
- Operations complete in under 100ms with predictable latency
- You need immediate consistency between services
- The call chain is shallow (2-3 services maximum)
- Your team can manage resilience patterns consistently
- Debugging simplicity outweighs loose coupling benefits
Production Checklist
# Synchronous Communication Production Readiness
- [ ] Timeouts configured for all outbound calls
- [ ] Retry logic with exponential backoff implemented
- [ ] Circuit breakers protecting downstream calls
- [ ] Correlation IDs propagated through call chains
- [ ] Health check endpoints on all services
- [ ] Structured logging with latency metrics
- [ ] Alerting configured for timeout and error rates
- [ ] Graceful degradation patterns in place
- [ ] Load shedding when downstream services are slow
- [ ] Request budgets limiting retry amplification
Related Posts
- RESTful API Design — Principles for designing intuitive REST APIs
- API Gateway Patterns — How API gateways handle routing, authentication, and rate limiting
- Circuit Breaker Pattern — Preventing cascade failures in distributed systems
- Resilience Patterns — Comprehensive guide to building fault-tolerant systems
Category
Related Posts
Load Balancing Algorithms: Round Robin, Least Connections, and Beyond
Explore load balancing algorithms used in microservices including round robin, least connections, weighted, IP hash, and adaptive algorithms.
Amazon's Architecture: Lessons from the Pioneer of Microservices
Learn how Amazon pioneered service-oriented architecture, the famous 'two-pizza team' rule, and how they built the foundation for AWS.
API Contracts: Design, Versioning, and Contract Testing
Master API contract design for microservices including OpenAPI specs, semantic versioning strategies, and automated contract testing.