Physical Clocks in Distributed Systems: NTP and Synchronization

Learn how physical clocks work in distributed systems, including NTP synchronization, clock sources, and the limitations of wall-clock time for ordering events.

published: reading time: 20 min read

Physical Clocks in Distributed Systems

Every computer has a clock. Most people never think about it. In distributed systems, clock behavior becomes critical. When two different machines process events, knowing which happened first is not trivial. Physical clocks—your system’s wall-clock time—are the starting point for tackling this problem.

Physical clocks come in two flavors. Hardware clocks run independently on each machine. Software clocks sync to external time sources like NTP. Neither is perfect, and understanding their limitations matters more than most engineers realize.

This post covers how physical clocks work, why synchronization is hard, and where they break down. Later posts in this series explore logical clocks and vector clocks as alternatives.


How Computers Track Time

Hardware Clocks

Modern CPUs have a crystal oscillator that drives a counter. This counter increments at a fixed rate, typically millions of times per second. The operating system reads this counter to track elapsed time since some epoch.

On Linux, two hardware clocks exist:

# System clock: maintained by the kernel, used for gettimeofday()
# Hardware clock: battery-backed CMOS clock, persists across reboots

# Check current time sources
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
# Output might be: tsc, hpet, or acpi_pm

# View clock precision
adjtimex --print | grep resolution

The hardware clock drifts. Crystal oscillators are accurate to maybe 20-50 parts per million. That sounds tiny, but over a day, your clock could be off by several milliseconds. Over a month, it could drift by seconds.

The System Clock

The kernel maintains a software clock derived from the hardware clock. This software clock is what your applications actually use:

// JavaScript: uses the system clock
const now = Date.now();  // Milliseconds since epoch

// Go: uses the system clock
import "time"
t := time.Now().UnixNano()  // Nanoseconds since epoch

// C: uses gettimeofday
struct timeval tv;
gettimeofday(&tv, NULL);
int64_t microseconds = (int64_t)tv.tv_sec * 1000000 + tv.tv_usec;

The epoch is typically January 1, 1970 UTC (the Unix epoch), though some systems use different starting points.


NTP Synchronization

The Network Time Protocol synchronizes system clocks to external time sources. It is the backbone of timekeeping on the internet.

How NTP Works

NTP uses a hierarchical system of time sources:

graph TD
    A[Stratum 0: Atomic Clocks] --> B[Stratum 1: Time Servers]
    B --> C[Stratum 2: University Servers]
    C --> D[Stratum 3: Public Servers]
    D --> E[Your Servers]

Stratum 0 are atomic clocks and GPS receivers. Stratum 1 servers sync directly to these. Each layer down adds some uncertainty.

NTP Algorithm

NTP works by exchanging UDP packets:

// Simplified NTP exchange
// 1. Client sends timestamp T1
// 2. Server receives at T2
// 3. Server sends response at T3
// 4. Client receives at T4

// Round-trip delay:
delay = T4 - T1 - (T3 - T2);

// Clock offset:
offset = (T2 - T1 + (T3 - T4)) / 2;

NTP takes multiple samples and uses selection algorithms to filter out bad measurements. The ntpd daemon runs continuously, adjusting the clock in small increments.

Configuring NTP

Most Linux systems use chrony now, which is more accurate and handles network fluctuations better:

# Install and start chrony
apt install chrony

# Check synchronization status
chronyc tracking

# Sample output:
# Reference ID    : A1B2C3D4 (time.example.com)
# Stratum         : 3
# Ref time (UTC)  : Mon Mar 24 10:15:30 2026
# System time     : 0.000012345 seconds slow of NTP time
# Last offset     : -0.000023456 seconds
# RMS offset      : 0.000034567 seconds

Clock Accuracy and Drift

Measuring Clock Drift

Real clocks do not keep perfect time. Drift rates vary based on temperature, age, and hardware quality:

# Measure drift rate over 24 hours
# Before: sync your clock to NTP
ntpdate -b pool.ntp.org

# After 24 hours:
# Check how far off you are
timedatectl status

Drift rates typically range from 1 to 50 parts per million (ppm). A 1 ppm drift means your clock drifts about 0.086 seconds per day.

Temperature Effects

Crystal oscillators are temperature-sensitive. Large swings in temperature cause measurable drift. Data centers with stable environments have an advantage here.

Temperature vs Drift (typical values):
- 20°C: 0 ppm (baseline)
- 25°C: +2 ppm
- 30°C: +5 ppm
- 35°C: +10 ppm

Rapid temperature changes cause immediate frequency shifts.
This is why outdoor servers or edge devices struggle with clock accuracy.

Hardware Clock Types

Different clock sources have different characteristics:

Clock SourceAccuracyStabilityCost
TSC (Time Stamp Counter)HighPoor under loadFree
HPETMediumGoodMotherboard
ACPI PM TimerLowGoodFree
GPS ReceiverVery HighExcellent$50-200
Atomic (Cesium)PerfectPerfect$10k+

The TSC clock is fastest but varies with CPU frequency scaling. On modern systems with constant TSC frequency, it is usually reliable for short intervals.


Limitations of Physical Clocks

Physical clocks have fundamental problems for distributed systems.

Clock Skew vs Clock Drift

Drift is the rate at which a clock runs fast or slow. Skew is the difference between two clocks at a point in time. Even if all your clocks drift at the same rate, skew accumulates because they started from different times.

sequenceDiagram
    participant A as Server A
    participant B as Server B
    participant NTP as NTP Server

    Note over A: Clock: 10:00:00.000
    Note over B: Clock: 10:00:00.150

    A->>NTP: Sync request
    B->>NTP: Sync request

    Note over A: After sync: 10:00:00.050
    Note over B: After sync: 10:00:00.100

    Note over A,B: Still 50ms apart due to previous drift

Non-Monotonic Clocks

System clocks can jump forward or backward. NTP can step the clock when drift exceeds a threshold, or slew it gradually. Either way, your application might see time go backward:

// Time going backward is possible with NTP sync
const t1 = Date.now(); // 1000
await doSomething();
const t2 = Date.now(); // 998 - time went backward!

Monotonic clocks solve this for relative time measurements, but they are not globally synchronized.

Leap Seconds

Once or twice per year, a leap second is added to UTC. This causes clocks to repeat a second or pause. Most systems handle this poorly:

# Leap second announced: June 30, 2015 23:59:60 UTC
# Many servers had kernel panics or went into infinite loops
# Some databases had corruption issues

Linux now handles leap seconds better, but they remain a source of unpredictability.


Using Physical Clocks in Distributed Systems

Despite their limitations, physical clocks are widely used. Knowing when to trust them matters.

When Physical Clocks Work

For non-critical timestamps and logging, wall-clock time is fine:

// Logging: wall-clock time is appropriate
logger.info({
  event: "user_login",
  timestamp: new Date().toISOString(), // Wall clock OK for logs
  userId: user.id,
});

// Audit trails benefit from wall clock
// Even if slightly off, audit timestamps are for human readability

When Physical Clocks Fail

For ordering events or conflict resolution, physical clocks are dangerous:

// BAD: Using wall clock for conflict resolution
function resolveConflict(local, remote) {
  // Assumes remote timestamp > local timestamp means remote is newer
  // FAILS when clocks are skewed
  if (remote.updatedAt > local.updatedAt) {
    return remote;
  }
  return local;
}

The PACELC theorem post discusses why timestamp-based conflict resolution breaks down in distributed systems.


Millisecond vs Microsecond vs Nanosecond

Precision vs Accuracy

Precision is how finely you can measure time. Accuracy is how close that measurement is to truth. You can be precise but inaccurate.

Clock Source          | Precision | Accuracy
----------------------|-----------|----------
System clock (NTP)    | 1 ms      | 10-100 ms
System clock (local)  | 1 us      | 1-50 ms (drift)
RDTSC (modern CPUs)   | 1 ns      | 1-50 ms (drift)
PTP (IEEE 1588)      | 1 ns      | 100 ns
GPS                   | 1 ns      | 50 ns

When You Need Better

High-frequency trading systems need nanosecond precision. Telecom systems need microsecond synchronization. Most applications do not, but understanding the options matters:

  • PTP (Precision Time Protocol): Hardware-level synchronization, achievable with specialized network equipment
  • GPS: Extremely accurate time source, requires hardware and clear sky view for GPS
  • Atomic clocks: Only for the most demanding applications (telecom, scientific)

IEEE 1588 PTP Deep Dive

IEEE 1588, also known as Precision Time Protocol (PTP), achieves nanosecond-level synchronization by using hardware timestamps and a master-slave architecture. It is the gold standard for time synchronization in financial trading, telecom, and industrial control systems.

How PTP Works

PTP synchronizes clocks in a network by exchanging precision timestamps. Unlike NTP which uses UDP and relies on software timestamps, PTP uses hardware timestamps when available and synchronizes at the network interface card level.

The synchronization process works as follows:

  1. Best Master Clock Algorithm (BMCA): All clocks run BMCA to elect the grandmaster clock—the most accurate time source. The grandmaster is typically a GPS-disciplined oscillator or atomic clock.

  2. Sync messages: The master sends Sync messages with the send timestamp (hardware-generated). The slave records the receive timestamp (hardware-generated).

  3. Follow-up messages: Because the send timestamp for Sync may not be known precisely at transmission time, the master sends a Follow-Up message with the corrected timestamp.

  4. Delay Request and Response: The slave sends a Delay Request to the master, which responds with a Delay Response. This measures the path delay.

Master                                          Slave
  |                                               |
  |------------- Sync (t1) --------------------->|
  | (t1 = master send timestamp, hardware)       |
  |                                               |
  |<----------- Delay Request (t4) --------------|
  | (t4 = slave send timestamp)                   |
  |                                               |
  |----- Follow-Up (t2) ------------------------>|
  | (t2 = corrected master send timestamp)        |
  |                                               |
  |<----- Delay Response (t3) --------------------|
  | (t3 = master receive timestamp)               |
  |                                               |
  Calculated values:
  - Offset from master: ((t2 - t1) - (t4 - t3)) / 2
  - Path delay: (t3 - t1) + (t4 - t2)) / 2

The critical insight is that PTP uses hardware timestamps at both ends, eliminating software delays that plague NTP.

PTP vs NTP: The Key Differences

FeaturePTP (IEEE 1588)NTP
Timestamp levelHardware (NIC-level)Software (OS kernel)
Best accuracySub-microsecond (100ns achievable)Milliseconds (10-100ms typical)
Network requirementsDedicated or VLAN with QoSAny IP network
ArchitectureMaster-slave with BMCAHierarchical with NTP pools
Hardware dependencyRequires PTP-capable NIC/switchWorks on any hardware
CostExpensive (switches, NICs)Free (software)
Convergence timeSeconds to minutesMinutes
Typical use casesHFT, telecom, power gridsGeneral servers, applications

Hardware Requirements for PTP

PTP requires specialized network infrastructure:

PTP-Capable NICs: Network interface cards with hardware support for IEEE 1588. These NICs capture timestamps at the hardware level, bypassing OS latency.

PTP-Aware Switches: Standard switches introduce variable latency as packets queue. PTP-aware switches use boundary clocks or transparent clocks to compensate for switch delay. They measure and correct for the time packets spend traversing the switch.

GPS Grandmaster Clocks: For the most accurate time source, GPS-disciplined oscillators provide nanosecond accuracy to UTC. They serve as the authoritative time source for PTP domains.

# Check if your NIC supports PTP
ethtool -T eth0 | grep -i timestamp
# Output might show:
# PTP Hardware Clock: 0
# Hardware Timestamping:-supported
# PTPv2 Event Port Transmit: supported
# PTPv2 Event Port Receive: supported

# Check PTP capabilities with ptp4l
ptp4l -i eth0 -l 6 -m
# -l 6 = log level debug
# -m = print messages to stdout

PTP Profiles

IEEE 1588 allows customization through profiles. Two are particularly important:

Default PTP Profile: General-purpose profile for enterprise networks. Allows up to 10ms path delay.

Power Profile (IEEE C37.238): Used in electrical substations for protection and control. Requires sub-microsecond accuracy for synchronizing phasor measurement units (PMUs).

Telecom Profile (G.8265.1): For telecom applications requiring frequency synchronization (not phase). Uses only Announce and Sync messages.

# Example: ptp4l configuration for default profile
# /etc/ptp4l.conf
[global]
domainNumber 0
priority1 128
priority2 128
clockType 2
servo_num_offset_threshold 1000

[eth0]
delayMechanism 1  # E2E (End to End)

Deployment Considerations

PTP requires careful network design:

  1. Network latency asymmetry: PTP assumes symmetric path delay. If forward and reverse paths have different latency (common in wireless or routed networks), synchronization degrades. Use symmetric network paths or boundary clocks.

  2. VLAN and QoS configuration: PTP messages must get priority queuing. Configure switches to give PTP traffic highest priority:

# Cisco switch example for PTP VLAN
vlan 100
 name ptp-vlan
!
interface GigabitEthernet1/0/1
 switchport mode trunk
 switchport trunk allowed vlan 100
 priority-mode dscp
!
# QoS configuration for PTP
mls qos map cos-dscp 46 34 26 18 0 0 0 0
# 46 = EF (Expedited Forwarding) for PTP
  1. Boundary Clock vs Transparent Clock: Boundary clocks terminate PTP at each switch and act as masters for downstream devices. Transparent clocks pass PTP through while correcting for switch delay. Boundary clocks are easier to deploy; transparent clocks provide better accuracy.

  2. Multi-domain PTP: Different PTP domains can run independently. Useful when you need separate time references for different subsystems.

When PTP Is Worth the Cost

PTP adds significant complexity and expense. Only deploy it when your requirements demand it:

RequirementNTP SufficientConsider PTP
Clock synchronization< 1 ms accuracy> 1 ms accuracy
Financial trading (HFT)NoYes (nanoseconds matter)
Telecom (4G/5G base stations)NoYes (phase synchronization)
Power grid synchronizationNoYes (PMU timing)
Industrial automationSometimesYes (motion control)
Video/audio sync (broadcast)SometimesYes (lip sync)
General server timekeepingYesNo
Distributed databasesUsuallyRarely

Most distributed databases do not need PTP. CockroachDB and Spanner use hybrid logical clocks or true time (hardware atomic clocks plus GPS) for distributed transaction ordering. For everything else, NTP with proper monitoring is sufficient.


Conclusion

Physical clocks are imperfect but practical. NTP synchronization keeps them reasonable for most uses. Wall-clock time works fine for logging, user-facing timestamps, and non-critical scheduling.

The problems arise when you need to order events across machines. Clock skew makes “what happened first” non-trivial to answer with physical clocks alone. This is why distributed systems turn to logical clocks and vector clocks, which the next posts in this series cover.

Key takeaways:

  1. Hardware clocks drift, and drift rates vary between machines
  2. NTP synchronization reduces skew but cannot eliminate it
  3. Clock skew makes wall-clock timestamps unreliable for event ordering
  4. Use physical clocks for human-readable timestamps and logging
  5. Use logical clocks for distributed event ordering

The Logical Clocks post covers Lamport timestamps, which provide a way to order events without synchronized physical clocks.


When to Use / When Not to Use Physical Clocks

ScenarioRecommendation
User-facing timestampsUse wall-clock with timezone handling
Application loggingUse wall-clock, useful for human review
Debugging and tracingUse wall-clock, correlates with real events
Event ordering across machinesDo not use wall-clock
Conflict resolutionDo not use wall-clock
Distributed consensusDo not use wall-clock alone
High-frequency tradingUse PTP or GPS, not NTP

When TO Use Physical Clocks

  • Displaying times to users
  • Logging events for human review
  • Auditing and compliance (with caution)
  • Non-critical scheduling where small errors are tolerable

When NOT to Use Physical Clocks

  • Determining causal ordering of distributed events
  • Conflict resolution in distributed databases
  • Implementing distributed protocols (consensus, coordination)
  • Any situation where “which happened first” matters

Production Considerations

Monitoring Clock Synchronization

# Check for clock synchronization issues
# Watch for large offset values

chronyc sources -v
# Should show multiple sources with ^*, indicating successful sync

chronyc sourcestats
# High "sd" (standard deviation) indicates unstable sync

Alerts to Set

AlertThresholdSeverity
Clock offset exceeds 100ms100 msWarning
Clock offset exceeds 500ms500 msCritical
NTP sync lostAnyWarning
Leap second eventAnyWarning

Common Issues

NTP daemon not running: Clock drifts freely until next manual sync Firewall blocking NTP: Cannot sync, clock drifts Virtual machines: Clocks run slower or faster than physical time (VMs have CPU time slicing issues) Cloud instances: Shared resources cause clock instability

For virtualized and cloud environments, use hypervisor-level synchronization when possible. Most cloud providers offer time sync services that are more reliable than public NTP.

Cloud Provider Time Sync Services

Major cloud providers provide specialized time synchronization services optimized for their infrastructure:

AWS Time Sync Service

Amazon Web Services provides a time sync service accessible via NTP at 169.254.169.123:

# AWS EC2 instance time sync
# The AWS Time Sync Service is available at this link-local address
# No NTP package installation needed on Amazon Linux 2023
cat /etc/chrony/chrony.conf
# server 169.254.169.123 prefer iburst

# For Ubuntu/Debian with systemd-timesyncd
# Edit /etc/systemd/timesyncd.conf
[Time]
NTP=169.254.169.123
FallbackNTP=169.254.169.123

# Verify sync status
timedatectl

AWS uses the Amazon Time Sync Service which is synchronized to GPS and atomic clocks in each region. The service runs on fleet instances with hardware clocks and is monitored by AWS.

Google Cloud NTP

Google Cloud Platform provides NTP through their metadata server and a dedicated time service:

# GCP NTP configuration
# For Debian/Ubuntu with systemd-timesyncd
cat /etc/systemd/timesyncd.conf
[Time]
NTP=metadata.google.internal
FallbackNTP=time.google.com

# For CentOS/RHEL with chrony
echo "server metadata.google.internal iburst" >> /etc/chrony/chrony.conf
systemctl restart chronyd

# Verify
chronyc tracking

GCP also offers the Google Public NTP service at time.google.com for non-GCP infrastructure.

Microsoft Azure Time Sync

Azure provides time sync through the Azure VMs themselves and an NTP service:

# Azure time sync status
# Azure VMs automatically sync to the host server time
# For explicit NTP configuration:

# Ubuntu/Debian
cat /etc/systemd/timesyncd.conf
[Time]
NTP=time.windows.com

# CentOS/RHEL with chrony
echo "server time.windows.com iburst" >> /etc/chrony/chrony.conf

# Verify Azure VM time sync
systemctl status systemd-timesyncd

Cloud Provider Comparison

ProviderNTP EndpointSourceAccuracySpecial Notes
AWS169.254.169.123GPS + Atomic~1-5 msLink-local, no network hops
GCPmetadata.google.internalGoogle atomic clocks~1-5 msVia metadata server
Azuretime.windows.comMicrosoft time servers~5-20 msMay require firewall rules
Oracle Cloud169.254.0.2Oracle stratum 1~5-10 msBuilt into Oracle Cloud infrastructure

Virtualization Clock Issues

Virtual machines face unique clock challenges that physical machines do not:

CPU Time Slicing: VMs share physical CPU cores. When a VM is scheduled off the CPU, its clock stops advancing. When rescheduled, it may appear to jump forward.

Live Migration: When VMs migrate between hosts, they may pause briefly. This causes clock discontinuities that NTP must compensate for.

Resource Contention: Under heavy load, VMs may not receive full CPU time, causing clock drift even with NTP running.

Hypervisor Solutions:

# VMware: Enable time synchronization with host
# On VMware, ensure these settings are configured:
# VM Tools > Options > Time synchronization > Synchronize time with host

# Verify VMware tools status
vmware-toolbox-cmd timesync status

# Hyper-V: Enable time synchronization
# On Hyper-V host:
# Set-VMProcessor -VMName "YourVM" -CompatibilityForTimeMigration $true

# Linux guest with Hyper-V integration services:
# The hv_utils driver provides time sync
lsmod | grep hv_utils

Best Practices for VMs:

  1. Use the cloud provider’s time sync service (169.254.169.123 for AWS)
  2. Enable hypervisor-level time sync where available
  3. Avoid using bare metal time sources inside VMs
  4. Monitor clock drift and set alerts for large offsets
  5. Consider using a monotonic clock for short-interval timing

Production Monitoring Implementation

Monitoring clock synchronization is critical for systems that depend on event ordering:

Prometheus Metrics for Clock Sync

# prometheus.yml - scrape configuration for node exporter
scrape_configs:
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]

# Key node_timex metrics to monitor:
# node_timex_offset_seconds - current clock offset from NTP
# node_timex_maxerror_seconds - maximum estimated error
# node_timex_loop_time_constant - phase-locked loop time constant
# node_timex_sync_status - whether sync is active (1 = synced)

Alerting Rules

# alerts/clock-sync.yml
groups:
  - name: clock_alerts
    interval: 30s
    rules:
      - alert: ClockOffsetHigh
        expr: abs(node_timex_offset_seconds) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Clock offset exceeds 100ms"
          description: "Clock on {{ $labels.instance }} is {{ $value }} seconds offset"

      - alert: ClockOffsetCritical
        expr: abs(node_timex_offset_seconds) > 0.5
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Clock offset exceeds 500ms"
          description: "Clock on {{ $labels.instance }} is critically offset"

      - alert: ClockSyncLost
        expr: node_timex_sync_status == 0
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "NTP synchronization lost"
          description: "Clock on {{ $labels.instance }} is not synchronized"

Clock Selection Decision Matrix

Use this matrix to select the appropriate time source for your use case:

Use CaseTime SourceAccuracy NeededRecommendation
User-facing timestampsWall clock1 secondSystem clock, human-readable
Application loggingWall clock1 millisecondSystem clock with NTP
Session managementMonotonic1 millisecondCLOCK_MONOTONIC
Distributed event orderingLogical clocksCausalityLamport timestamps
Conflict resolutionLogical/VectorCausalityVector clocks
Database replicationLogical clocksCausalityHybrid Logical Clocks
Financial transactionsPTP/GPSNanosecondsIEEE 1588 or GPS
Telecom synchronizationPTP/GPSMicrosecondsIEEE 1588 with hardware
Scientific experimentsAtomicNanosecondsDedicated time server
CDN edge nodesNTP1 millisecondLocal stratum 1

Quick Recap

  • Physical clocks are imperfect timekeepers that drift and skew
  • NTP synchronizes to external sources but cannot eliminate uncertainty
  • Clock skew makes wall-clock time unsuitable for distributed ordering
  • Physical clocks work for logging, user display, and non-critical uses
  • For event ordering, use logical clocks or vector clocks

For more on distributed systems fundamentals, see CAP Theorem, Consistency Models, and Geo-Distribution.

Category

Related Posts

Clock Skew in Distributed Systems: Problems and Solutions

Explore how clock skew affects distributed systems, causes silent data corruption, breaks conflict resolution, and what you can do to mitigate these issues.

#distributed-systems #distributed-computing #clock-skew

Logical Clocks: Lamport Timestamps and Event Ordering

Understand Lamport timestamps and logical clocks for ordering distributed events without synchronized physical clocks. Learn how to determine what happened before what.

#distributed-systems #distributed-computing #logical-clocks

TrueTime: Google's Globally Synchronized Clock Infrastructure

Learn how Google uses TrueTime for globally distributed transactions with external consistency. Covers the Spanner system, time bounded uncertainty, and HW-assisted synchronization.

#distributed-systems #distributed-computing #true-time