Clock Skew in Distributed Systems: Problems and Solutions

Explore how clock skew affects distributed systems, causes silent data corruption, breaks conflict resolution, and what you can do to mitigate these issues.

published: reading time: 29 min read author: GeekWorkBench

Introduction

Clock skew is the difference between two clocks at a given moment. Clock drift is the rate at which a clock runs faster or slower than true time. In distributed systems, these imperfections create subtle bugs that are hard to reproduce and harder to debug.

Most engineers know about clock skew in theory. Few have encountered its real-world manifestations. This post explores what happens when clocks lie, and how to build systems that survive their lies.


Why Clocks Lie

Hardware Reality

Hardware clocks are imperfect. A typical crystal oscillator might drift 20 parts per million (ppm). This sounds tiny:

20 ppm drift means:
- 20 microseconds per second
- 1.7 seconds per day
- About 10 minutes per year

But “typical” hides variation. Some crystals drift 50 ppm. Temperature changes alter drift rates. Under load, a virtual machine’s clock can drift significantly from its host.

NTP Synchronization Helps but Does Not Solve

NTP reduces skew but cannot eliminate it. Several reasons:

  1. Round-trip latency uncertainty: NTP estimates network delay but cannot measure it exactly
  2. Clock stepping: When NTP corrects large errors, it might step the clock forward or backward
  3. Leap seconds: These cause unusual clock behavior
  4. Virtual machine effects: VMs running on shared infrastructure see erratic clock behavior
graph TD
    A[Server A Clock] -->|drifts| B[A: 10:00:00.050]
    A2[Server B Clock] -->|drifts| C[B: 10:00:00.100]

    D[NTP Server] -->|syncs| B
    D -->|syncs| C

    Note over B,C: 50ms skew persists even after NTP sync

The typical NTP synchronization accuracy is within 10-100 milliseconds on the public internet. On local networks, you might get 1-10 milliseconds. For many applications, this is fine. For distributed transaction ordering, it is not.


How Clock Skew Breaks Systems

Problem 1: Incorrect Event Ordering

The most common issue is events appearing to happen in the wrong order. A write at 10:00:00.000 on Server A might have a lower timestamp than a causally-dependent write at 10:00:00.001 on Server B, even though B’s write depended on A’s.

// Example: two-server timestamp ordering failure
// Server A: writes x=1 at wall-clock 10:00:00.000
// (A's clock is 50ms slow)
// Server B: receives A's message, writes y=x at wall-clock 10:00:00.001
// (B's clock is perfectly synchronized)

// On disk: timestamps suggest B's write happened AFTER A's
// But if B's clock was 50ms slow relative to A's at that moment,
// A's write might actually have happened AFTER B's write
// Causality is violated!

This breaks database replication, distributed logging, and any system that relies on timestamps to determine what happened first.

Problem 2: Cache Invalidation Failures

Cache invalidation often uses timestamps. If your cache key is “last-modified-time”, clock skew causes stale cache to be served:

// Cache invalidation with timestamps
async function getUser(userId) {
  const cached = await cache.get(`user:${userId}`);
  const lastModified = await db.getLastModified(userId);

  if (cached && cached.modified >= lastModified) {
    return cached.data; // Problem: if DB server clock is ahead,
    // lastModified might be in the future
  }

  const fresh = await db.getUser(userId);
  await cache.set(`user:${userId}`, fresh, lastModified);
  return fresh;
}

Problem 3: Lease and Expiration Bugs

Distributed systems often use leases with timeouts. If clock skew causes a lease to appear valid when it should have expired, you get split-brain scenarios:

// Lease-based lock
async function acquireLock(resourceId, ttlSeconds) {
  const leaseExpiry = Date.now() + ttlSeconds * 1000;

  const acquired = await redis.set(
    `lock:${resourceId}`,
    myInstanceId,
    "NX", // Only if not exists
    "PX", // Set expiry in milliseconds
    ttlSeconds * 1000,
  );

  if (acquired) {
    return { success: true, expiresAt: leaseExpiry };
  }

  // Problem: another instance might see this lease as expired
  // while the lock holder still thinks it holds the lock
  return { success: false };
}

Problem 4: JWT and Security Token Issues

JSON Web Tokens include expiration times. If server clocks disagree about the current time, tokens might be accepted or rejected incorrectly:

// JWT verification
const token = jwt.sign({ userId: 123 }, secret, { expiresIn: "1h" });

// Server A (ahead 30s): issues token valid until 11:30
// Server B (behind 30s): at 11:00, sees token as not yet valid!

// This can allow replay attacks or block legitimate users

Silent Data Corruption

The most dangerous clock skew issues are silent. The system appears to work, but data is subtly wrong.

Example: Last-Write-Wins Databases

Consider a last-write-wins (LWW) database using timestamps for conflict resolution. Server A and Server B both write to the same key:

sequenceDiagram
    participant A as Server A
    participant B as Server B
    participant DB as Database

    Note over A: Clock drifts -100ms
    Note over B: Clock drifts +50ms

    A->>DB: Write K=V1 at T=1000
    B->>DB: Write K=V2 at T=1050

    Note over DB: DB clock = T=1025

    Note over DB: LWW keeps V2 (higher timestamp)<br/>But V1 was actually written 75ms AFTER V2!
    Note over DB: Data corruption!

The database made the wrong decision. The user wanted V1 but got V2. No error occurred, no exception was thrown. The data is simply wrong.

The Heisenbug Problem

Clock skew bugs are heisenbugs: they disappear when you look at them. Adding logging changes timing. Debugging affects the system enough that the bug no longer manifests.

// You add this log to debug:
console.log(`Writing at ${Date.now()}`);

// The I/O for logging slightly changes timing
// The race condition that depended on precise timing no longer triggers
// The bug appears to fix itself when you try to observe it

Detecting Clock Skew

Active Monitoring

Monitor clock offset between your servers and a reference time source:

#!/bin/bash
# Check clock offset from NTP server

NTPDRIFT=$(grep offset /var/lib/ntp/ntp.drift 2>/dev/null || echo "0")

echo "Estimated clock offset: ${NTPDRIFT}ms"

# More detailed check with chrony
chronyc tracking | grep "System time"

# Alert if offset exceeds threshold
if [ $(echo "$NTPDRIFT > 100" | bc) -eq 1 ]; then
  echo "CRITICAL: Clock skew exceeds 100ms"
fi

Passive Detection

Detect skew from application behavior, not just clock comparisons:

// Detect skew from message timestamps
function detectClockSkew(messages) {
  // Sort by send timestamp, then by receive timestamp
  // If receive timestamps violate send ordering, skew is likely

  const sorted = [...messages].sort((a, b) => {
    const sendDiff = a.sendTime - b.sendTime;
    if (sendDiff !== 0) return sendDiff;
    return a.recvTime - b.recvTime;
  });

  let violations = 0;
  for (let i = 1; i < sorted.length; i++) {
    if (sorted[i].recvTime < sorted[i - 1].recvTime) {
      violations++;
    }
  }

  return { violations, skewLikely: violations > 0 };
}

Distributed Tracing for Skew Detection

Modern tracing systems can detect clock skew from trace data:

// Jaeger-style span timing
// If a child span starts before its parent ends, clock skew exists
function validateSpanTiming(span) {
  if (span.startTime < span.parent.endTime) {
    return { valid: false, issue: "child-before-parent" };
  }

  if (span.endTime > span.startTime + MAX_SPAN_DURATION) {
    return { valid: false, issue: "duration-anomaly" };
  }

  return { valid: true };
}

Mitigating Clock Skew

Use Logical Clocks Instead

For event ordering, use logical clocks rather than physical timestamps:

// Instead of:
const timestamp = Date.now();

// Use:
const logicalTime = lamportClock.tick();

The Logical Clocks and Vector Clocks posts cover this in depth.

Fencing Tokens

For distributed locks, use fencing tokens that increment with each write:

class FencedLock {
  async acquire(resourceId) {
    const lock = await redis.set(
      `lock:${resourceId}`,
      fencingToken++,
      "NX",
      "EX",
      30,
    );

    if (lock) {
      return { acquired: true, token: fencingToken };
    }
    return { acquired: false };
  }

  async write(resourceId, data, token) {
    // Check token on write - reject if lower than last successful token
    const lastToken = await redis.get(`lastToken:${resourceId}`);

    if (token <= lastToken) {
      throw new Error("Stale write rejected");
    }

    await redis.set(`lastToken:${resourceId}`, token);
    await redis.set(`data:${resourceId}`, data);
  }
}

Hybrid Logical Clocks

Hybrid Logical Clocks (HLCs) combine the best of physical and logical clocks. They preserve meaningful physical timestamps while guaranteeing causal ordering across distributed events, making them ideal for systems that need both human-readable time and strict event ordering.

class HybridLogicalClock {
  constructor() {
    this.pt = 0; // Physical time (milliseconds)
    this.lc = 0; // Logical time
    this.nodeId = process.id;
  }

  now() {
    const wallClock = Date.now();

    if (wallClock > this.pt) {
      this.pt = wallClock;
      this.lc = 0;
    } else {
      this.lc++;
    }

    return {
      physical: this.pt,
      logical: this.lc,
      nodeId: this.nodeId,
      encoded: `${this.pt}-${this.lc}-${this.nodeId}`,
    };
  }

  receive(other) {
    // other is an HLC timestamp from a received message
    const wallClock = Date.now();

    this.pt = Math.max(this.pt, other.physical, wallClock);

    if (this.pt === other.physical && this.pt === wallClock) {
      this.lc = Math.max(this.lc, other.logical) + 1;
    } else if (this.pt === other.physical) {
      this.lc++;
    } else if (this.pt === wallClock) {
      this.lc = 0;
    }

    return this.now();
  }
}

UTC vs TAI: Understanding Time Scales

There is confusion about time scales that matters for distributed systems. UTC (Coordinated Universal Time) is not continuous—leap seconds cause it to repeat or skip. TAI (International Atomic Time) is continuous but not commonly used.

// The problem with UTC leap seconds
// TAI is continuous: ... TAI-32: 1000000000, TAI-32: 1000000001, ...
// UTC is not: 2016-12-31 23:59:59 UTC, then 2016-12-31 23:59:60 UTC (leap second), then 2017-01-01 00:00:00 UTC

// Most systems use UTC but handle leap seconds poorly
// Some systems use TAI internally for continuous time

// Google Spanner uses TrueTime which accounts for uncertainty bounds
// CockroachDB uses Hybrid Logical Clocks (HLC)

UTC characteristics:

  • Based on atomic time but corrected with leap seconds
  • Not continuous—can repeat or skip seconds
  • Used by NTP and most operating systems
  • Has ~36 second offset from TAI (as of 2026)

TAI characteristics:

  • Pure atomic time, no leap seconds
  • Continuous and monotonically increasing
  • Used in scientific contexts and some databases
  • Not directly available on most systems

For continuous time systems:

  • Use TAI internally if your application cannot tolerate leap seconds
  • Most Linux systems can use CLOCK_TAI (kernel 3.10+)
  • GPS time is TAI-based (minus small offset)
# Check if your system supports CLOCK_TAI
# Linux 3.10+ supports CLOCK_TAI
clock_gettime(CLOCK_TAI, &ts);

// Verify TAI support
cat /proc/version | grep -i tai
# Or check via chrony
chronyc sources | grep TAI

NTP Clock Slewing vs Stepping

NTP has two modes for correcting clock drift: slewing and stepping. Understanding when each is used matters for your application.

Clock Slewing (preferred for most applications):

  • Adjusts the clock frequency slightly to gradually correct drift
  • Clock continues forward without jumps
  • No effect on gettimeofday() or clock_gettime()
  • Safe for applications that measure elapsed time between events
  • Uses the -x flag in ntpd or is default in chrony

Clock Stepping (can cause problems):

  • Directly changes the clock value
  • Can make time go backward or forward suddenly
  • Breaks applications that calculate elapsed time
  • Can cause issues with databases, caches, and security tokens
# Configure chrony for slewing (default behavior)
# /etc/chrony/chrony.conf
# By default, chrony uses slewing for small offsets

# To enable stepping for large offsets (not recommended):
# maxdistance 1.0
# This allows stepping for offsets > 1 second

# For ntpd, use -x for slewing only:
# ntpd -x -g -q
# -x: slew instead of step
# -g: step if offset exceeds panic threshold
# -q: query only and exit

# Check current NTP behavior
chronyc tracking | grep "System time"
# Output shows if time is being slewed or stepped
# "slew" means slewing is active
# "step" would indicate stepping occurred

When to use each:

ScenarioRecommended ModeReason
Normal operationSlewingNo time jumps, safe for all apps
Large initial offsetStepping once at startupGet into sync quickly
Real-time tradingSlewing onlyCannot have time going backwards
Scientific computingSlewing onlyElapsed time must be accurate
Virtual machinesSlewing (with hypervisor sync)VM clock can drift erratically

Vendor-Specific Time Synchronization

Major cloud providers offer specialized time sync services that differ from public NTP:

Cloud Provider Time Sync

AWS Time Sync Service
# AWS EC2: Use the link-local time sync service
# Available at 169.254.169.123 (no network hops needed)

# For Amazon Linux / RHEL / CentOS:
# Already configured by default with chrony
cat /etc/chrony/chrony.conf
# Should show: server 169.254.169.123 prefer iburst

# For Ubuntu/Debian with systemd-timesyncd:
cat /etc/systemd/timesyncd.conf
[Time]
NTP=169.254.169.123
FallbackNTP=

# Verify it's working
timedatectl status
# Should show "System clock synchronized: yes"

# AWS provides ~1-5ms accuracy via their fleet-wide sync
Google Cloud Platform Time Sync
# GCP: Use the metadata server for time
# metadata.google.internal provides time via DHCP/metadata

# For Debian/Ubuntu:
cat /etc/systemd/timesyncd.conf
[Time]
NTP=metadata.google.internal
FallbackNTP=time.google.com

# For CentOS/RHEL with chrony:
echo "server metadata.google.internal iburst" >> /etc/chrony/chrony.conf
systemctl restart chronyd

# Verify
chronyc tracking
# Should show Reference ID of metadata.google.internal

# GCP also offers Google Public NTP at time.google.com
# for non-GCP infrastructure
Microsoft Azure Time Sync
# Azure VMs get time from the host server by default
# For explicit NTP configuration:

# Ubuntu/Debian:
cat /etc/systemd/timesyncd.conf
[Time]
NTP=time.windows.com

# CentOS/RHEL with chrony:
echo "server time.windows.com iburst" >> /etc/chrony/chrony.conf
systemctl restart chronyd

# Verify Azure VM time sync
systemctl status systemd-timesyncd
timedatectl status

# Azure provides time through the host, accuracy varies by region
Verification Commands for All Providers
# Common verification commands across all platforms

# Check current time sync status
timedatectl status
# Look for:
# - System clock synchronized: yes/no
# - NTP service: active/inactive
# - Time zone and offset

# Check detailed NTP statistics with chrony
chronyc sources -v
# Columns: M=mode, S=state, * = current sync, # = selected
# ^ indicates good sources

chronyc sourcestats
# Look for: sd (standard deviation) - lower is better

# Check for clock offset
chronyc tracking | grep "System time"
# Shows offset and whether it's being slewed or stepped

# Test NTP query directly
ntpdate -q pool.ntp.org
# Shows offset from multiple NTP servers

# For Prometheus node_exporter metrics (if installed)
curl localhost:9100/metrics | grep node_timex
# node_timex_offset_seconds - current offset
# node_timex_maxerror_seconds - maximum estimated error
# node_timex_sync_status - 1 if synced
# node_timex_loop_time_constant - PLL time constant

Application-Level Skew Detection from Database Replication

Beyond monitoring clocks directly, you can detect clock skew from database behavior:

// Detect clock skew from replication timestamps
async function detectSkewFromReplication(db) {
  // Most databases track replication lag with timestamps
  const replicationStatus = await db.query(`
    SELECT
      application_name,
      pg_last_xact_replay_timestamp() as replay_time,
      now() as current_time,
      now() - pg_last_xact_replay_timestamp() as lag
    FROM pg_stat_replication
  `);

  for (const replica of replicationStatus.rows) {
    const lagMs = replica.lag;
    const clockOffset = new Date(replica.current_time) - replica.replay_time;

    // If lag is small but clock offset is large, clocks are skewed
    if (lagMs < 1000 && Math.abs(clockOffset) > 100) {
      console.warn(`Clock skew detected on ${replica.application_name}`, {
        lagMs,
        clockOffsetMs: clockOffset,
      });
    }
  }
}

// MySQL replication skew detection
async function detectMySQLSkew(connection) {
  const [rows] = await connection.query(`
    SELECT
      Source_Log_File,
      Read_Master_Log_Pos,
       Relay_Log_File,
      Relay_Log_Pos,
      Seconds_Behind_Master
    FROM SHOW SLAVE STATUS
  `);

  for (const replica of rows) {
    // Seconds_Behind_Master > 0 indicates lag
    // But sudden jumps might indicate clock skew
    if (replica.Seconds_Behind_Master > 0) {
      console.warn("Replication lag detected", replica);
    }
  }
}

Prometheus node_timex Metrics Reference

For comprehensive clock monitoring in Prometheus:

# prometheus.yml
scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

# Key metrics to monitor:
# node_timex_offset_seconds - Current clock offset from NTP server
#   Alert threshold: > 0.1 seconds (warning), > 0.5 seconds (critical)

# node_timex_maxerror_seconds - Maximum estimated error
#   Should be relatively stable, spikes indicate issues

# node_timex_sync_status - Whether clock is synchronized
#   0 = unsynchronized, 1 = synchronized

# node_timex_loop_time_constant - Phase-locked loop time constant
#   Affects how quickly clock adjustments are made

# node_timex_frequency_adjustment - Current frequency adjustment in ppm
#   Large values indicate unstable clock

# Alerting rules for clock skew
groups:
  - name: clock_alerts
    rules:
      - alert: ClockOffsetHigh
        expr: abs(node_timex_offset_seconds) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Clock offset exceeds 100ms on {{ $labels.instance }}"
          description: "Offset: {{ $value }} seconds"

      - alert: ClockOffsetCritical
        expr: abs(node_timex_offset_seconds) > 0.5
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Clock offset exceeds 500ms on {{ $labels.instance }}"
          description: "Critical clock skew: {{ $value }} seconds"

      - alert: ClockSyncLost
        expr: node_timex_sync_status == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "NTP synchronization lost on {{ $labels.instance }}"

Tight NTP Synchronization

For applications that must use physical time, use tighter NTP synchronization:

# Use multiple NTP servers for better accuracy
# /etc/chrony/chrony.conf

server time1.google.com iburst
server time2.google.com iburst
server time3.google.com iburst

# Increase polling frequency for tighter sync
minpoll 4
maxpoll 6

# Enable hardware timestamping if available
hwtimestamp *

# Set maximum allowed offset
maxdistance 1.0

Reference Clocks for Critical Systems

For financial or trading systems, use dedicated reference clocks:

# GPS-based NTP server for critical infrastructure
# Stratum 1 server with GPS receiver

# /etc/chrony/chrony.conf
refclock GPS /dev/pps0 poll 3 precision 1e-9
refclock SHM 0 offset 0.5 delay 0.001 refid NMEA

Production Failure Scenarios

The 2015 Leap Second Incident

On June 30, 2015, a leap second was added to UTC. Several services had issues:

  • Reddit went down for 30 minutes
  • Cloudflare’s DNS service had elevated error rates
  • Some Linux kernels had issues with the 23:59:60 second

The root cause: many systems did not handle the unusual clock behavior correctly.

The AWS 2012 Clock Drift Issue

In late 2012, some AWS instances experienced clock drift of several minutes. This caused issues with SSL certificate validation, session token expiration, and database replication.

AWS added clock synchronization improvements to their hypervisor. This is why running in cloud environments is generally better than self-managed VMs for clock accuracy.


Common Pitfalls / Anti-Patterns

Do

// DO: Use logical clocks for ordering
const order = lamportClock.tick();

// DO: Use fencing tokens for distributed writes
await writeWithFencing(key, value, fencingToken++);

// DO: Monitor clock offsets
metrics.gauge("clock_offset_ms").set(measureOffset());

// DO: Use hybrid approaches for user-facing timestamps
// Show wall-clock time, use logical time internally

Do Not

// DO NOT: Use wall-clock for conflict resolution
if (remote.timestamp > local.timestamp) return remote;

// DO NOT: Assume clocks are synchronized across regions
const t1 = await fetchFromUS(key);
const t2 = await fetchFromEU(key); // t2 might have different clock source

// DO NOT: Use timestamps as unique identifiers
const id = `${Date.now()}-${random()}`; // Collisions possible

Quick Recap Checklist

Monitoring

  • Clock offset from reference (alert at 100ms, critical at 500ms)
  • Clock drift rate per server
  • NTP synchronization failures
  • Leap second announcement alerts

Testing

  • Inject clock skew in testing environments
  • Test with NTP stopped for extended periods
  • Verify behavior during leap seconds
  • Test failover with clocks at different offsets

Design Review

  • Identify all uses of wall-clock time
  • Replace timestamp-based ordering with logical clocks
  • Add fencing tokens to any distributed write path
  • Review session/token expiration for clock skew tolerance

Trade-off Analysis

When designing systems that handle time, various approaches trade accuracy for complexity, latency, or operational overhead. Here are the key trade-offs:

ApproachAccuracyComplexityLatency ImpactBest For
Wall-clock only1-100ms (NTP dependent)LowNoneHuman-readable logs, non-critical timestamps
NTP with public servers10-100msLowMinimalGeneral purpose, non-transactional systems
Cloud provider time sync1-5msLowNoneCloud VMs, containerized workloads
GPS-based Stratum 1< 1msHigh (hardware required)NoneHFT, financial trading, audit-critical systems
Lamport Logical ClocksCausally accurateMediumNoneEvent ordering without physical timestamps
Vector ClocksCausally accurateHighNoneMulti-master conflict resolution, wide-area
Hybrid Logical Clocks (HLC)Physical + causalMediumNoneSystems needing both readable time and ordering
TrueTime (Spanner)Bounded uncertainty (~6ms)Very high6ms commit waitGlobally distributed transactions with strong consistency

Key insights:

  • You cannot have both perfect physical timestamps and guaranteed causal ordering—not without either waiting (TrueTime-style) or abandoning physical time entirely for logical approaches
  • HLCs hit the sweet spot for most systems: meaningful physical timestamps for debugging and logs, plus guaranteed causal ordering for correctness
  • Stratum 1 GPS infrastructure only makes sense when the cost of clock skew exceeds the hardware and operational cost of running it

Interview Questions

1. You notice that Server A's clock is 50ms ahead of Server B's. Both servers are running NTP and appear synchronized. Why might this be a problem for a distributed database using timestamp-based conflict resolution?

Expected answer points:

  • Last-write-wins databases may choose the wrong version—writes that happened first at the source could be discarded if their timestamps appear later
  • The 50ms skew means causally-dependent writes can appear in reversed order, violating application-level invariants
  • NTP showing "synchronized" does not mean clocks are within the precision needed for conflict resolution—typical NTP accuracy is 10-100ms
2. A service uses wall-clock timestamps to invalidate cached responses based on a `max-age` value fetched from another service. What clock skew scenario causes stale cache to be served, and how does it manifest?

Expected answer points:

  • If the cache's server has a clock behind the origin server, the `modified` timestamp comparison sees lastModified as greater than the cached version's timestamp, causing premature stale returns
  • Conversely, if cache server clock is ahead, it may think content is fresh when origin already updated
  • The issue is asymmetric: whichever server has the higher clock wins the comparison regardless of actual freshness
3. Explain the difference between clock skew and clock drift. Why do engineers sometimes call clock skew bugs "heisenbugs"?

Expected answer points:

  • Clock skew is the difference between two clocks at a single point in time; clock drift is the rate at which a clock runs fast or slow relative to true time
  • Heisenbugs disappear when you observe them—adding logging changes I/O timing, which changes the precise timing conditions the bug depends on
  • The act of measurement changes the system enough that the race condition no longer triggers
4. Your team wants to use wall-clock timestamps for distributed event ordering. What is the fundamental limitation that makes this approach unreliable across data centers?

Expected answer points:

  • Even with NTP, clocks can differ by 1-100ms depending on network conditions and geographic distance between data centers
  • A write at 10:00:00.000 in us-east-1 might have a lower timestamp than a causally-dependent write at 10:00:00.001 in eu-west-1, even though the first write must have happened first
  • Physical time cannot guarantee causal ordering—only logical time can
5. Describe how a fencing token solves the problem of stale lock holders in a distributed lock implementation.

Expected answer points:

  • When acquiring a lock, the holder receives a monotonically increasing token
  • On every write operation, the token must be presented—writes with tokens lower than the last successful token are rejected
  • Even if clock skew causes a lock to appear valid to an expired holder, the rejected write still prevents corruption
  • The token increments on each successful lock acquisition, so no two holders can have the same token
6. Why does a leap second cause problems for systems that rely on wall-clock time, and what real-world incident demonstrated this in 2015?

Expected answer points:

  • UTC inserts leap seconds as extra seconds (23:59:60), causing the clock to pause or repeat
  • Applications that assume time always moves forward may experience issues when time goes backward or stands still
  • In 2015, Reddit went down for 30 minutes, Cloudflare's DNS had elevated errors, and some Linux kernels panicked on the 23:59:60 second
  • Many systems did not handle the unusual clock behavior correctly
7. Your application uses JWT tokens with `exp` claims for authentication. Server A's clock is 30 seconds ahead of Server B's. What security or availability issues can occur?

Expected answer points:

  • Server A issues a token valid until 11:30; Server B at 11:00 sees the token as not yet valid, blocking legitimate users
  • This can enable replay attacks if an attacker captures a token that one server rejects but another accepts
  • Tokens may be accepted or rejected inconsistently depending on which server handles the request
8. What is the difference between NTP clock slewing and clock stepping? When would you prefer one over the other?

Expected answer points:

  • Slewing adjusts clock frequency gradually—time continues forward without jumps; preferred for real-time apps, trading systems, and anywhere elapsed time calculations matter
  • Stepping directly changes the clock value—time can jump forward or backward; breaks any code that calculates elapsed time between events
  • For financial trading or scientific computing, slewing is mandatory—you cannot have time going backwards
9. Explain Hybrid Logical Clocks (HLCs). How do they combine physical and logical time, and why would you choose HLCs over pure logical clocks?

Expected answer points:

  • HLCs store both physical time (wall clock) and logical counter; they provide meaningful physical timestamps while guaranteeing causal ordering
  • When receiving an HLC timestamp, the local clock takes the max of local physical, received physical, and wall clock—ensuring monotonic increase
  • Useful when you need both human-readable timestamps (for logging, debugging, display) and guaranteed ordering for correctness
  • CockroachDB uses HLCs; Google Spanner uses TrueTime (similar concept with uncertainty bounds)
10. You need to monitor clock skew across 100 servers in multiple regions. What metrics would you collect, and what alert thresholds would you set?

Expected answer points:

  • Use `node_timex_offset_seconds` from Prometheus—alert at > 100ms (warning), > 500ms (critical)
  • Track `node_timex_maxerror_seconds` for maximum estimated error; spikes indicate problems
  • Monitor `node_timex_sync_status` (1 = synced, 0 = unsynchronized) for loss of NTP sync
  • Check `node_timex_frequency_adjustment` for unstable clocks (large ppm values indicate oscillator problems)
11. A distributed trace shows a child span starting before its parent span ended. What does this indicate, and how would you investigate it?

Expected answer points:

  • Clock skew exists between the machines recording the parent and child spans—child's machine clock is behind parent's
  • Could be VM clock drift, network asymmetry, or NTP not yet synced on one machine
  • Investigate by checking clock offsets against a reference time source on all hosts involved in the trace
12. Your PostgreSQL replica shows replication lag under 1 second, but the clock offset between primary and replica exceeds 100ms. What does this indicate, and what is the risk?

Expected answer points:

  • Replication lag measures how far behind the replica is in applying WAL; small lag with large clock offset means the clocks themselves are skewed, not just replication slow
  • Timestamp-based queries or time-dependent constraints may behave incorrectly across primary and replica
  • The replica may serve stale data for read-after-write patterns if the application checks timestamps
13. What is CLOCK_TAI and why might a distributed system prefer it over CLOCK_REALTIME (UTC-based)?

Expected answer points:

  • CLOCK_TAI (International Atomic Time) is continuous—no leap seconds, no repeats or jumps
  • UTC is corrected with leap seconds, making it non-continuous; the 23:59:60 second causes problems
  • TAI is ahead of UTC by ~36 seconds (as of 2026) and always monotonic
  • Linux 3.10+ supports CLOCK_TAI; GPS time is TAI-based
14. Explain how Google Spanner's TrueTime API handles clock uncertainty. Why is this important for distributed transaction ordering?

Expected answer points:

  • TrueTime returns a time interval `[earliest, latest]` instead of a point estimate, explicitly representing uncertainty bounds
  • A transaction can only commit when the wall time has advanced past the transaction's start time by at least the maximum uncertainty interval
  • This guarantees that if Transaction A commits before Transaction B starts in real time, B sees A's commit as in the past
  • The cost is commit wait time—Spanner typically waits ~6ms to bound uncertainty
15. An AWS EC2 instance in us-east-1 has a clock that drifts 5 minutes over 24 hours. What specific AWS feature addresses this, and how do you verify it's working?

Expected answer points:

  • AWS provides a link-local time sync service at 169.254.169.123—no network hops needed for extremely low latency sync
  • Amazon Linux and RHEL have this configured by default via chrony; Ubuntu/Debian use systemd-timesyncd
  • Verify with `timedatectl status` showing "System clock synchronized: yes"
  • AWS claims ~1-5ms accuracy via their fleet-wide sync mechanism
16. Your team considers using wall-clock timestamps as unique identifiers for requests. What is the fundamental problem with this approach in a distributed system?

Expected answer points:

  • Two servers can generate the same timestamp for different requests if their clocks are not perfectly synchronized
  • The collision rate depends on clock resolution and skew—not just randomness
  • Even with additional randomness appended, the approach is fragile; use a distributed ID generator (Snowflake, UUIDv7) instead
17. A microservices system uses wall-clock time to determine which of two conflicting updates to accept. What is a safer alternative that does not require perfectly synchronized clocks?

Expected answer points:

  • Use vector clocks or Lamport clocks to track causal ordering—accept whichever update has a happens-before relationship to the other
  • If using wall-clock for conflict resolution in a LWW (last-write-wins) database, switch to version vectors—each write carries a monotonically increasing version number
  • Fencing tokens prevent stale writes from being accepted even if they arrive out of order
18. You discover that your Cassandra cluster uses LOCAL_QUORUM consistency with timestamp-based conflict resolution. The cluster spans two data centers with 30ms latency between them. What clock skew issues can occur?

Expected answer points:

  • Writes at the same physical moment from different data centers will have different timestamps based on each DC's local clock
  • The DC with the ahead clock will win writes, regardless of which write actually happened first in real time
  • With 30ms inter-DC latency, clock skew of even 10ms can invert causal ordering of writes
  • Solution: use lightweight transactions (LWT) with Paxos, or switch to a consistency model that does not rely on timestamps
19. During a chaos engineering test, you inject 200ms of clock skew between two servers. What specific behaviors would you expect to observe in a system that relies on wall-clock for event ordering but does not use fencing tokens?

Expected answer points:

  • Events that were causally ordered in real time appear reversed in the application's event log
  • Cache invalidation fails—stale data is served because the cache key timestamp is compared against a server with the skewed clock
  • Database writes may be lost or cause phantom conflicts if using LWW conflict resolution
  • JWT or session tokens may be rejected or accepted inconsistently
20. Your CTO asks you to recommend a time synchronization architecture for a high-frequency trading system where 1ms of clock skew can affect P&L reporting. What do you recommend and why?

Expected answer points:

  • Use GPS-based Stratum 1 time servers (dedicated hardware clock) for sub-millisecond accuracy
  • Configure chrony with CLOCK_TAI for continuous time without leap second complications
  • Enable hardware timestamping (hwtimestamp) to reduce network jitter in time measurements
  • Use slewing exclusively (never stepping)—time reversals in a trading system cause catastrophic calculation errors
  • Set up redundant GPS receivers and reference clocks for failover
  • Monitor clock offset with alerting at 0.5ms warning threshold, 1ms critical threshold

Further Reading

Core Concepts

  • Physical Clocks — Deep dive into hardware clocks, oscillator drift, and time scale foundations
  • Logical Clocks — Lamport clocks and why physical time fails for ordering
  • Vector Clocks — Distributed conflict resolution without centralized coordination

Operational


Conclusion

Clock skew is a fundamental challenge in distributed systems. Physical clocks cannot be perfectly synchronized, and relying on them for ordering or conflict resolution leads to subtle bugs.

The solutions are well-understood: use logical clocks for ordering, fencing tokens for writes, and hybrid approaches when you need both physical meaning and logical correctness.

Most applications do not need perfect time. They need consistent ordering, which logical clocks provide. Reserve physical clocks for human-readable timestamps and logging where precision matters less.

For more on clock solutions, see Physical Clocks, Logical Clocks, and Vector Clocks.

Category

Related Posts

Logical Clocks: Lamport Timestamps and Event Ordering

Understand Lamport timestamps and logical clocks for ordering distributed events without synchronized physical clocks. Learn how to determine what happened before what.

#distributed-systems #distributed-computing #logical-clocks

Physical Clocks in Distributed Systems: NTP and Synchronization

Learn how physical clocks work in distributed systems, including NTP synchronization, clock sources, and the limitations of wall-clock time for ordering events.

#distributed-systems #distributed-computing #clock-synchronization

TrueTime: Google's Globally Synchronized Clock Infrastructure

Learn how Google uses TrueTime for globally distributed transactions with external consistency. Covers the Spanner system, time bounded uncertainty, and HW-assisted synchronization.

#distributed-systems #distributed-computing #true-time