Clock Skew in Distributed Systems: Problems and Solutions
Explore how clock skew affects distributed systems, causes silent data corruption, breaks conflict resolution, and what you can do to mitigate these issues.
Clock Skew in Distributed Systems
Clock skew is the difference between two clocks at a given moment. Clock drift is the rate at which a clock runs faster or slower than true time. In distributed systems, these imperfections create subtle bugs that are hard to reproduce and harder to debug.
Most engineers know about clock skew in theory. Few have encountered its real-world manifestations. This post explores what happens when clocks lie, and how to build systems that survive their lies.
Why Clocks Lie
Hardware Reality
Hardware clocks are imperfect. A typical crystal oscillator might drift 20 parts per million (ppm). This sounds tiny:
20 ppm drift means:
- 20 microseconds per second
- 1.7 seconds per day
- About 10 minutes per year
But “typical” hides variation. Some crystals drift 50 ppm. Temperature changes alter drift rates. Under load, a virtual machine’s clock can drift significantly from its host.
NTP Synchronization Helps but Does Not Solve
NTP reduces skew but cannot eliminate it. Several reasons:
- Round-trip latency uncertainty: NTP estimates network delay but cannot measure it exactly
- Clock stepping: When NTP corrects large errors, it might step the clock forward or backward
- Leap seconds: These cause unusual clock behavior
- Virtual machine effects: VMs running on shared infrastructure see erratic clock behavior
graph TD
A[Server A Clock] -->|drifts| B[A: 10:00:00.050]
A2[Server B Clock] -->|drifts| C[B: 10:00:00.100]
D[NTP Server] -->|syncs| B
D -->|syncs| C
Note over B,C: 50ms skew persists even after NTP sync
The typical NTP synchronization accuracy is within 10-100 milliseconds on the public internet. On local networks, you might get 1-10 milliseconds. For many applications, this is fine. For distributed transaction ordering, it is not.
How Clock Skew Breaks Systems
Problem 1: Incorrect Event Ordering
The most common issue is events appearing to happen in the wrong order. A write at 10:00:00.000 on Server A might have a lower timestamp than a causally-dependent write at 10:00:00.001 on Server B, even though B’s write depended on A’s.
// Example: two-server timestamp ordering failure
// Server A: writes x=1 at wall-clock 10:00:00.000
// (A's clock is 50ms slow)
// Server B: receives A's message, writes y=x at wall-clock 10:00:00.001
// (B's clock is perfectly synchronized)
// On disk: timestamps suggest B's write happened AFTER A's
// But if B's clock was 50ms slow relative to A's at that moment,
// A's write might actually have happened AFTER B's write
// Causality is violated!
This breaks database replication, distributed logging, and any system that relies on timestamps to determine what happened first.
Problem 2: Cache Invalidation Failures
Cache invalidation often uses timestamps. If your cache key is “last-modified-time”, clock skew causes stale cache to be served:
// Cache invalidation with timestamps
async function getUser(userId) {
const cached = await cache.get(`user:${userId}`);
const lastModified = await db.getLastModified(userId);
if (cached && cached.modified >= lastModified) {
return cached.data; // Problem: if DB server clock is ahead,
// lastModified might be in the future
}
const fresh = await db.getUser(userId);
await cache.set(`user:${userId}`, fresh, lastModified);
return fresh;
}
Problem 3: Lease and Expiration Bugs
Distributed systems often use leases with timeouts. If clock skew causes a lease to appear valid when it should have expired, you get split-brain scenarios:
// Lease-based lock
async function acquireLock(resourceId, ttlSeconds) {
const leaseExpiry = Date.now() + ttlSeconds * 1000;
const acquired = await redis.set(
`lock:${resourceId}`,
myInstanceId,
"NX", // Only if not exists
"PX", // Set expiry in milliseconds
ttlSeconds * 1000,
);
if (acquired) {
return { success: true, expiresAt: leaseExpiry };
}
// Problem: another instance might see this lease as expired
// while the lock holder still thinks it holds the lock
return { success: false };
}
Problem 4: JWT and Security Token Issues
JSON Web Tokens include expiration times. If server clocks disagree about the current time, tokens might be accepted or rejected incorrectly:
// JWT verification
const token = jwt.sign({ userId: 123 }, secret, { expiresIn: "1h" });
// Server A (ahead 30s): issues token valid until 11:30
// Server B (behind 30s): at 11:00, sees token as not yet valid!
// This can allow replay attacks or block legitimate users
Silent Data Corruption
The most dangerous clock skew issues are silent. The system appears to work, but data is subtly wrong.
Example: Last-Write-Wins Databases
Consider a last-write-wins (LWW) database using timestamps for conflict resolution. Server A and Server B both write to the same key:
sequenceDiagram
participant A as Server A
participant B as Server B
participant DB as Database
Note over A: Clock drifts -100ms
Note over B: Clock drifts +50ms
A->>DB: Write K=V1 at T=1000
B->>DB: Write K=V2 at T=1050
Note over DB: DB clock = T=1025
Note over DB: LWW keeps V2 (higher timestamp)<br/>But V1 was actually written 75ms AFTER V2!
Note over DB: Data corruption!
The database made the wrong decision. The user wanted V1 but got V2. No error occurred, no exception was thrown. The data is simply wrong.
The Heisenbug Problem
Clock skew bugs are heisenbugs: they disappear when you look at them. Adding logging changes timing. Debugging affects the system enough that the bug no longer manifests.
// You add this log to debug:
console.log(`Writing at ${Date.now()}`);
// The I/O for logging slightly changes timing
// The race condition that depended on precise timing no longer triggers
// The bug appears to fix itself when you try to observe it
Detecting Clock Skew
Active Monitoring
Monitor clock offset between your servers and a reference time source:
#!/bin/bash
# Check clock offset from NTP server
NTPDRIFT=$(grep offset /var/lib/ntp/ntp.drift 2>/dev/null || echo "0")
echo "Estimated clock offset: ${NTPDRIFT}ms"
# More detailed check with chrony
chronyc tracking | grep "System time"
# Alert if offset exceeds threshold
if [ $(echo "$NTPDRIFT > 100" | bc) -eq 1 ]; then
echo "CRITICAL: Clock skew exceeds 100ms"
fi
Passive Detection
Detect skew from application behavior, not just clock comparisons:
// Detect skew from message timestamps
function detectClockSkew(messages) {
// Sort by send timestamp, then by receive timestamp
// If receive timestamps violate send ordering, skew is likely
const sorted = [...messages].sort((a, b) => {
const sendDiff = a.sendTime - b.sendTime;
if (sendDiff !== 0) return sendDiff;
return a.recvTime - b.recvTime;
});
let violations = 0;
for (let i = 1; i < sorted.length; i++) {
if (sorted[i].recvTime < sorted[i - 1].recvTime) {
violations++;
}
}
return { violations, skewLikely: violations > 0 };
}
Distributed Tracing for Skew Detection
Modern tracing systems can detect clock skew from trace data:
// Jaeger-style span timing
// If a child span starts before its parent ends, clock skew exists
function validateSpanTiming(span) {
if (span.startTime < span.parent.endTime) {
return { valid: false, issue: "child-before-parent" };
}
if (span.endTime > span.startTime + MAX_SPAN_DURATION) {
return { valid: false, issue: "duration-anomaly" };
}
return { valid: true };
}
Mitigating Clock Skew
Use Logical Clocks Instead
For event ordering, use logical clocks rather than physical timestamps:
// Instead of:
const timestamp = Date.now();
// Use:
const logicalTime = lamportClock.tick();
The Logical Clocks and Vector Clocks posts cover this in depth.
Fencing Tokens
For distributed locks, use fencing tokens that increment with each write:
class FencedLock {
async acquire(resourceId) {
const lock = await redis.set(
`lock:${resourceId}`,
fencingToken++,
"NX",
"EX",
30,
);
if (lock) {
return { acquired: true, token: fencingToken };
}
return { acquired: false };
}
async write(resourceId, data, token) {
// Check token on write - reject if lower than last successful token
const lastToken = await redis.get(`lastToken:${resourceId}`);
if (token <= lastToken) {
throw new Error("Stale write rejected");
}
await redis.set(`lastToken:${resourceId}`, token);
await redis.set(`data:${resourceId}`, data);
}
}
Hybrid Logical Clocks
Hybrid Logical Clocks (HLCs) combine physical and logical time. They provide both a meaningful physical timestamp and guaranteed ordering:
class HybridLogicalClock {
constructor() {
this.pt = 0; // Physical time (milliseconds)
this.lc = 0; // Logical time
this.nodeId = process.id;
}
now() {
const wallClock = Date.now();
if (wallClock > this.pt) {
this.pt = wallClock;
this.lc = 0;
} else {
this.lc++;
}
return {
physical: this.pt,
logical: this.lc,
nodeId: this.nodeId,
encoded: `${this.pt}-${this.lc}-${this.nodeId}`,
};
}
receive(other) {
// other is an HLC timestamp from a received message
const wallClock = Date.now();
this.pt = Math.max(this.pt, other.physical, wallClock);
if (this.pt === other.physical && this.pt === wallClock) {
this.lc = Math.max(this.lc, other.logical) + 1;
} else if (this.pt === other.physical) {
this.lc++;
} else if (this.pt === wallClock) {
this.lc = 0;
}
return this.now();
}
}
UTC vs TAI: Understanding Time Scales
There is confusion about time scales that matters for distributed systems. UTC (Coordinated Universal Time) is not continuous—leap seconds cause it to repeat or skip. TAI (International Atomic Time) is continuous but not commonly used.
// The problem with UTC leap seconds
// TAI is continuous: ... TAI-32: 1000000000, TAI-32: 1000000001, ...
// UTC is not: 2016-12-31 23:59:59 UTC, then 2016-12-31 23:59:60 UTC (leap second), then 2017-01-01 00:00:00 UTC
// Most systems use UTC but handle leap seconds poorly
// Some systems use TAI internally for continuous time
// Google Spanner uses TrueTime which accounts for uncertainty bounds
// CockroachDB uses Hybrid Logical Clocks (HLC)
UTC characteristics:
- Based on atomic time but corrected with leap seconds
- Not continuous—can repeat or skip seconds
- Used by NTP and most operating systems
- Has ~36 second offset from TAI (as of 2026)
TAI characteristics:
- Pure atomic time, no leap seconds
- Continuous and monotonically increasing
- Used in scientific contexts and some databases
- Not directly available on most systems
For continuous time systems:
- Use TAI internally if your application cannot tolerate leap seconds
- Most Linux systems can use
CLOCK_TAI(kernel 3.10+) - GPS time is TAI-based (minus small offset)
# Check if your system supports CLOCK_TAI
# Linux 3.10+ supports CLOCK_TAI
clock_gettime(CLOCK_TAI, &ts);
// Verify TAI support
cat /proc/version | grep -i tai
# Or check via chrony
chronyc sources | grep TAI
NTP Clock Slewing vs Stepping
NTP has two modes for correcting clock drift: slewing and stepping. Understanding when each is used matters for your application.
Clock Slewing (preferred for most applications):
- Adjusts the clock frequency slightly to gradually correct drift
- Clock continues forward without jumps
- No effect on
gettimeofday()orclock_gettime() - Safe for applications that measure elapsed time between events
- Uses the
-xflag in ntpd or is default in chrony
Clock Stepping (can cause problems):
- Directly changes the clock value
- Can make time go backward or forward suddenly
- Breaks applications that calculate elapsed time
- Can cause issues with databases, caches, and security tokens
# Configure chrony for slewing (default behavior)
# /etc/chrony/chrony.conf
# By default, chrony uses slewing for small offsets
# To enable stepping for large offsets (not recommended):
# maxdistance 1.0
# This allows stepping for offsets > 1 second
# For ntpd, use -x for slewing only:
# ntpd -x -g -q
# -x: slew instead of step
# -g: step if offset exceeds panic threshold
# -q: query only and exit
# Check current NTP behavior
chronyc tracking | grep "System time"
# Output shows if time is being slewed or stepped
# "slew" means slewing is active
# "step" would indicate stepping occurred
When to use each:
| Scenario | Recommended Mode | Reason |
|---|---|---|
| Normal operation | Slewing | No time jumps, safe for all apps |
| Large initial offset | Stepping once at startup | Get into sync quickly |
| Real-time trading | Slewing only | Cannot have time going backwards |
| Scientific computing | Slewing only | Elapsed time must be accurate |
| Virtual machines | Slewing (with hypervisor sync) | VM clock can drift erratically |
Vendor-Specific Time Synchronization
Major cloud providers offer specialized time sync services that differ from public NTP:
AWS Time Sync Service
# AWS EC2: Use the link-local time sync service
# Available at 169.254.169.123 (no network hops needed)
# For Amazon Linux / RHEL / CentOS:
# Already configured by default with chrony
cat /etc/chrony/chrony.conf
# Should show: server 169.254.169.123 prefer iburst
# For Ubuntu/Debian with systemd-timesyncd:
cat /etc/systemd/timesyncd.conf
[Time]
NTP=169.254.169.123
FallbackNTP=
# Verify it's working
timedatectl status
# Should show "System clock synchronized: yes"
# AWS provides ~1-5ms accuracy via their fleet-wide sync
Google Cloud Platform Time Sync
# GCP: Use the metadata server for time
# metadata.google.internal provides time via DHCP/metadata
# For Debian/Ubuntu:
cat /etc/systemd/timesyncd.conf
[Time]
NTP=metadata.google.internal
FallbackNTP=time.google.com
# For CentOS/RHEL with chrony:
echo "server metadata.google.internal iburst" >> /etc/chrony/chrony.conf
systemctl restart chronyd
# Verify
chronyc tracking
# Should show Reference ID of metadata.google.internal
# GCP also offers Google Public NTP at time.google.com
# for non-GCP infrastructure
Microsoft Azure Time Sync
# Azure VMs get time from the host server by default
# For explicit NTP configuration:
# Ubuntu/Debian:
cat /etc/systemd/timesyncd.conf
[Time]
NTP=time.windows.com
# CentOS/RHEL with chrony:
echo "server time.windows.com iburst" >> /etc/chrony/chrony.conf
systemctl restart chronyd
# Verify Azure VM time sync
systemctl status systemd-timesyncd
timedatectl status
# Azure provides time through the host, accuracy varies by region
Verification Commands for All Providers
# Common verification commands across all platforms
# Check current time sync status
timedatectl status
# Look for:
# - System clock synchronized: yes/no
# - NTP service: active/inactive
# - Time zone and offset
# Check detailed NTP statistics with chrony
chronyc sources -v
# Columns: M=mode, S=state, * = current sync, # = selected
# ^ indicates good sources
chronyc sourcestats
# Look for: sd (standard deviation) - lower is better
# Check for clock offset
chronyc tracking | grep "System time"
# Shows offset and whether it's being slewed or stepped
# Test NTP query directly
ntpdate -q pool.ntp.org
# Shows offset from multiple NTP servers
# For Prometheus node_exporter metrics (if installed)
curl localhost:9100/metrics | grep node_timex
# node_timex_offset_seconds - current offset
# node_timex_maxerror_seconds - maximum estimated error
# node_timex_sync_status - 1 if synced
# node_timex_loop_time_constant - PLL time constant
Application-Level Skew Detection from Database Replication
Beyond monitoring clocks directly, you can detect clock skew from database behavior:
// Detect clock skew from replication timestamps
async function detectSkewFromReplication(db) {
// Most databases track replication lag with timestamps
const replicationStatus = await db.query(`
SELECT
application_name,
pg_last_xact_replay_timestamp() as replay_time,
now() as current_time,
now() - pg_last_xact_replay_timestamp() as lag
FROM pg_stat_replication
`);
for (const replica of replicationStatus.rows) {
const lagMs = replica.lag;
const clockOffset = new Date(replica.current_time) - replica.replay_time;
// If lag is small but clock offset is large, clocks are skewed
if (lagMs < 1000 && Math.abs(clockOffset) > 100) {
console.warn(`Clock skew detected on ${replica.application_name}`, {
lagMs,
clockOffsetMs: clockOffset,
});
}
}
}
// MySQL replication skew detection
async function detectMySQLSkew(connection) {
const [rows] = await connection.query(`
SELECT
Source_Log_File,
Read_Master_Log_Pos,
Relay_Log_File,
Relay_Log_Pos,
Seconds_Behind_Master
FROM SHOW SLAVE STATUS
`);
for (const replica of rows) {
// Seconds_Behind_Master > 0 indicates lag
// But sudden jumps might indicate clock skew
if (replica.Seconds_Behind_Master > 0) {
console.warn("Replication lag detected", replica);
}
}
}
Prometheus node_timex Metrics Reference
For comprehensive clock monitoring in Prometheus:
# prometheus.yml
scrape_configs:
- job_name: "node"
static_configs:
- targets: ["localhost:9100"]
# Key metrics to monitor:
# node_timex_offset_seconds - Current clock offset from NTP server
# Alert threshold: > 0.1 seconds (warning), > 0.5 seconds (critical)
# node_timex_maxerror_seconds - Maximum estimated error
# Should be relatively stable, spikes indicate issues
# node_timex_sync_status - Whether clock is synchronized
# 0 = unsynchronized, 1 = synchronized
# node_timex_loop_time_constant - Phase-locked loop time constant
# Affects how quickly clock adjustments are made
# node_timex_frequency_adjustment - Current frequency adjustment in ppm
# Large values indicate unstable clock
# Alerting rules for clock skew
groups:
- name: clock_alerts
rules:
- alert: ClockOffsetHigh
expr: abs(node_timex_offset_seconds) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "Clock offset exceeds 100ms on {{ $labels.instance }}"
description: "Offset: {{ $value }} seconds"
- alert: ClockOffsetCritical
expr: abs(node_timex_offset_seconds) > 0.5
for: 1m
labels:
severity: critical
annotations:
summary: "Clock offset exceeds 500ms on {{ $labels.instance }}"
description: "Critical clock skew: {{ $value }} seconds"
- alert: ClockSyncLost
expr: node_timex_sync_status == 0
for: 1m
labels:
severity: warning
annotations:
summary: "NTP synchronization lost on {{ $labels.instance }}"
Tight NTP Synchronization
For applications that must use physical time, use tighter NTP synchronization:
# Use multiple NTP servers for better accuracy
# /etc/chrony/chrony.conf
server time1.google.com iburst
server time2.google.com iburst
server time3.google.com iburst
# Increase polling frequency for tighter sync
minpoll 4
maxpoll 6
# Enable hardware timestamping if available
hwtimestamp *
# Set maximum allowed offset
maxdistance 1.0
Reference Clocks for Critical Systems
For financial or trading systems, use dedicated reference clocks:
# GPS-based NTP server for critical infrastructure
# Stratum 1 server with GPS receiver
# /etc/chrony/chrony.conf
refclock GPS /dev/pps0 poll 3 precision 1e-9
refclock SHM 0 offset 0.5 delay 0.001 refid NMEA
Real-World Incidents
The 2015 Leap Second Incident
On June 30, 2015, a leap second was added to UTC. Several services had issues:
- Reddit went down for 30 minutes
- Cloudflare’s DNS service had elevated error rates
- Some Linux kernels had issues with the 23:59:60 second
The root cause: many systems did not handle the unusual clock behavior correctly.
The AWS 2012 Clock Drift Issue
In late 2012, some AWS instances experienced clock drift of several minutes. This caused issues with SSL certificate validation, session token expiration, and database replication.
AWS added clock synchronization improvements to their hypervisor. This is why running in cloud environments is generally better than self-managed VMs for clock accuracy.
Best Practices
Do
// DO: Use logical clocks for ordering
const order = lamportClock.tick();
// DO: Use fencing tokens for distributed writes
await writeWithFencing(key, value, fencingToken++);
// DO: Monitor clock offsets
metrics.gauge("clock_offset_ms").set(measureOffset());
// DO: Use hybrid approaches for user-facing timestamps
// Show wall-clock time, use logical time internally
Do Not
// DO NOT: Use wall-clock for conflict resolution
if (remote.timestamp > local.timestamp) return remote;
// DO NOT: Assume clocks are synchronized across regions
const t1 = await fetchFromUS(key);
const t2 = await fetchFromEU(key); // t2 might have different clock source
// DO NOT: Use timestamps as unique identifiers
const id = `${Date.now()}-${random()}`; // Collisions possible
Production Checklist
Monitoring
- Clock offset from reference (alert at 100ms, critical at 500ms)
- Clock drift rate per server
- NTP synchronization failures
- Leap second announcement alerts
Testing
- Inject clock skew in testing environments
- Test with NTP stopped for extended periods
- Verify behavior during leap seconds
- Test failover with clocks at different offsets
Design Review
- Identify all uses of wall-clock time
- Replace timestamp-based ordering with logical clocks
- Add fencing tokens to any distributed write path
- Review session/token expiration for clock skew tolerance
Conclusion
Clock skew is a fundamental challenge in distributed systems. Physical clocks cannot be perfectly synchronized, and relying on them for ordering or conflict resolution leads to subtle bugs.
The solutions are well-understood: use logical clocks for ordering, fencing tokens for writes, and hybrid approaches when you need both physical meaning and logical correctness.
Most applications do not need perfect time. They need consistent ordering, which logical clocks provide. Reserve physical clocks for human-readable timestamps and logging where precision matters less.
For more on clock solutions, see Physical Clocks, Logical Clocks, and Vector Clocks.
Quick Recap
- Clock skew causes event ordering failures, cache bugs, and security issues
- Silent data corruption can occur with timestamp-based conflict resolution
- Detect skew via monitoring, tracing, and anomaly detection
- Mitigate with logical clocks, fencing tokens, and HLCs
- Test your system under clock skew conditions
Copy/Paste Checklist
- [ ] Audit all uses of wall-clock time in distributed paths
- [ ] Replace timestamp-based ordering with logical clocks
- [ ] Add fencing tokens to distributed write operations
- [ ] Set up clock skew monitoring with alerts
- [ ] Test with injected clock skew
- [ ] Use NTP with tight synchronization for physical timestamps
- [ ] Handle leap seconds explicitly Category
Tags
Related Posts
Logical Clocks: Lamport Timestamps and Event Ordering
Understand Lamport timestamps and logical clocks for ordering distributed events without synchronized physical clocks. Learn how to determine what happened before what.
Physical Clocks in Distributed Systems: NTP and Synchronization
Learn how physical clocks work in distributed systems, including NTP synchronization, clock sources, and the limitations of wall-clock time for ordering events.
TrueTime: Google's Globally Synchronized Clock Infrastructure
Learn how Google uses TrueTime for globally distributed transactions with external consistency. Covers the Spanner system, time bounded uncertainty, and HW-assisted synchronization.