It's about time

Today I remembered an exam question from around 2005 or 2006 for a course taught by Professor Pascal Bouvry at the University of Luxembourg. We had to explain an algorithm to synchronize time between two computers. Obviously, explaining how the Network Time Protocol (NTP) works would have been the perfect answer. Back then, I wasn't sure exactly how NTP operated, so I devised my own algorithm. Professor Bouvry deemed it sufficient – good for my grade, but that's not the point of this post. Ever since that exam, I've been intrigued by the "right" answer and the actual mechanics of NTP, but I never took the time to properly research it. Until today, nearly 20 years later! Fundamentals of NTP Time Synchronization When a computer A (the client) needs to synchronize its clock with another computer B (the server), whose time is considered accurate, NTP uses a clever exchange of timestamps: The NTP client sends a request packet to an NTP server. This packet includes the client's current time, known as the originate timestamp (T1). The NTP server receives the packet at its time, the receive timestamp (T2). The server processes the request and sends a reply packet back. This packet includes T1, T2, and the server's time when the reply departs, the transmit timestamp (T3). The client receives the reply packet at its time, the destination timestamp (T4). Crucially, T1 and T4 are measured using the client's clock, while T2 and T3 are measured using the server's clock. Calculating the round-trip delay Using these four timestamps, NTP calculates the round-trip delay: the total time taken for the request to reach the server and the reply to return to the client. Round-trip delay = (T4-T3)+(T2-T1), which may be rewritten to the standard formula: Round-trip delay = (T4-T1)-(T3-T2). Subtracting the server processing time from the total client-side elapsed time gives the time actually spent in network transit (both ways). This calculation accounts for network latency. Note that the timestamps T1 and T4 are times measured on the client before correction of its time, and T2 and T3 are times measured on the server. This explains why the round-trip delay is usually expressed using the second expression supra instead of the first one, as it computes time intervals measured on the server for one interval and on the client for the other interval, then adds both: it logically separates client-side total duration and server-side processing duration This makes sense, because the time not being synchronised, it could mean that the client has a clock that is ahead of the server's clock and that T2-T1 might be zero or negative, which does not make much sense physically, although it is correct mathematically. To avoid this, we prefer the second expression: the time readings on the client and on the server being a monotonous function means that T4>T1 and T3>T2 given clocks whose accuracy is higher than respectively the round-trip delay and the time to reply. Calculating the time offset The time offset is the difference between the client's clock and the server's clock. This is the correction needed for the client's time. To calculate this, NTP makes a key assumption: the network path delay is symmetrical (i.e., the time taken for the request to travel client-to-server is the same as the time taken for the reply server-to-client). Let's call this one-way delay Delay. We can express the timestamps relative to each other, incorporating the Offset and Delay: Server receives request: T2 ≈ T1 + Offset + Delay Client receives reply: T4 ≈ T3 - Offset + Delay (Offset is subtracted because T4 is relative to T3 on the server's timeline, but measured on the client's potentially offset clock) Rearranging these gives: Offset + Delay ≈ T2-T1 Offset - Delay ≈ T3-T4 Which, by adding these equations, leads to: Offset + Delay + Offset - Delay ≈ T2-T1 + T3-T4, and finally: Offset ≈ [(T2-T1)+(T3-T4)]/2. The client's clock needs to be adjusted by adding this calculated Offset. How synchronization and adjustment actually happen Calculating the offset is just one part. Real-world NTP clients typically: Query multiple NTP servers to get several time references. Use statistical algorithms to filter out unreliable servers (outliers) and weigh the remaining sources to determine the most accurate time. Apply the correction gradually using slewing (adjusting clock speed) for small differences, rather than making abrupt stepping changes (instantaneous jumps). This ensures smoother timekeeping. Repeat this process periodically to maintain synchronization. About slewing and stepping These are the two main ways an NTP client applies corrections: Slewing This involves gradually adjusting the system clock's effective speed to match the correct time. If the client clock is behind, the NTP daemon makes the software clock run slightly faster until it catches up. If the client clo

Apr 16, 2025 - 00:49
 0
It's about time

Today I remembered an exam question from around 2005 or 2006 for a course taught by Professor Pascal Bouvry at the University of Luxembourg. We had to explain an algorithm to synchronize time between two computers.

Obviously, explaining how the Network Time Protocol (NTP) works would have been the perfect answer. Back then, I wasn't sure exactly how NTP operated, so I devised my own algorithm. Professor Bouvry deemed it sufficient – good for my grade, but that's not the point of this post.

Ever since that exam, I've been intrigued by the "right" answer and the actual mechanics of NTP, but I never took the time to properly research it. Until today, nearly 20 years later!

Fundamentals of NTP Time Synchronization

When a computer A (the client) needs to synchronize its clock with another computer B (the server), whose time is considered accurate, NTP uses a clever exchange of timestamps:

  • The NTP client sends a request packet to an NTP server. This packet includes the client's current time, known as the originate timestamp (T1).
  • The NTP server receives the packet at its time, the receive timestamp (T2).
  • The server processes the request and sends a reply packet back. This packet includes T1, T2, and the server's time when the reply departs, the transmit timestamp (T3).
  • The client receives the reply packet at its time, the destination timestamp (T4).

Crucially, T1 and T4 are measured using the client's clock, while T2 and T3 are measured using the server's clock.

Calculating the round-trip delay

Using these four timestamps, NTP calculates the round-trip delay: the total time taken for the request to reach the server and the reply to return to the client.

Round-trip delay = (T4-T3)+(T2-T1), which may be rewritten to the standard formula:

Round-trip delay = (T4-T1)-(T3-T2).

Subtracting the server processing time from the total client-side elapsed time gives the time actually spent in network transit (both ways). This calculation accounts for network latency.

Note that the timestamps T1 and T4 are times measured on the client before correction of its time, and T2 and T3 are times measured on the server. This explains why the round-trip delay is usually expressed using the second expression supra instead of the first one, as it computes time intervals measured on the server for one interval and on the client for the other interval, then adds both: it logically separates client-side total duration and server-side processing duration

This makes sense, because the time not being synchronised, it could mean that the client has a clock that is ahead of the server's clock and that T2-T1 might be zero or negative, which does not make much sense physically, although it is correct mathematically. To avoid this, we prefer the second expression: the time readings on the client and on the server being a monotonous function means that T4>T1 and T3>T2 given clocks whose accuracy is higher than respectively the round-trip delay and the time to reply.

Calculating the time offset

The time offset is the difference between the client's clock and the server's clock. This is the correction needed for the client's time.

To calculate this, NTP makes a key assumption: the network path delay is symmetrical (i.e., the time taken for the request to travel client-to-server is the same as the time taken for the reply server-to-client). Let's call this one-way delay Delay.

We can express the timestamps relative to each other, incorporating the Offset and Delay:

  • Server receives request: T2 ≈ T1 + Offset + Delay
  • Client receives reply: T4 ≈ T3 - Offset + Delay (Offset is subtracted because T4 is relative to T3 on the server's timeline, but measured on the client's potentially offset clock)

Rearranging these gives:

Offset + Delay ≈ T2-T1

Offset - Delay ≈ T3-T4

Which, by adding these equations, leads to:

Offset + Delay + Offset - Delay ≈ T2-T1 + T3-T4, and finally:

Offset ≈ [(T2-T1)+(T3-T4)]/2.

The client's clock needs to be adjusted by adding this calculated Offset.

How synchronization and adjustment actually happen

Calculating the offset is just one part. Real-world NTP clients typically:

  1. Query multiple NTP servers to get several time references.
  2. Use statistical algorithms to filter out unreliable servers (outliers) and weigh the remaining sources to determine the most accurate time.
  3. Apply the correction gradually using slewing (adjusting clock speed) for small differences, rather than making abrupt stepping changes (instantaneous jumps). This ensures smoother timekeeping.
  4. Repeat this process periodically to maintain synchronization.

About slewing and stepping

These are the two main ways an NTP client applies corrections:

Slewing

This involves gradually adjusting the system clock's effective speed to match the correct time.

  • If the client clock is behind, the NTP daemon makes the software clock run slightly faster until it catches up.
  • If the client clock is ahead, it makes the software clock run slightly slower.
  • Key Advantage: Avoids sudden time jumps, which is crucial for applications relying on smoothly progressing time. This is the preferred method for small, ongoing corrections.

Stepping

This involves an instantaneous jump in the system clock to the correct time.

  • Used for large initial corrections or when the time difference is too significant to slew quickly.
  • Potential Disadvantage: Can cause time to appear to stand still or even go backward, potentially disrupting applications.

How Slewing Works: It's usually not the physical hardware clock frequency that changes. Instead, the operating system kernel adjusts how it interprets ticks from the hardware timer to maintain the software clock. The kernel applies a small correction factor, effectively making the software clock run slightly faster or slower. This is often managed via system calls like adjtimex (on Linux), controlled by the NTP daemon.

Consequences of slewing and stepping

  • Slewing: Adjusts the rate of the clock. Time continues to move forward monotonically. Slewing itself does not cause time to go backward, which is a vital guarantee for many applications. Intervals measured during a slew will still have positive durations (though slightly inaccurate relative to true elapsed time).
  • Stepping: Involves an instant jump. If the clock is stepped backward, applications measuring time across that jump could read a negative or zero time interval if they rely on the system's wall-clock time.

Practical tip

If you configure only one NTP server on your client, it will only attempt to synchronize with that single server. While NTP can work this way, it's not ideal:

  • Single point of failure: If that server is down or inaccurate, your synchronization fails.
  • No cross-checking: The client cannot perform the sophisticated filtering and selection algorithms that compare multiple sources to improve accuracy and reliability.

Best practice: Configure at least 3-4 diverse and reliable NTP servers.

Consequence for Application Developers: Wall Clocks vs. Monotonic Clocks

While NTP slewing avoids making time go backward, developers must understand the type of clock their application uses, especially for measuring durations:

  • Wall-Clock Time (or Real-Time Clock): This is the actual time of day (e.g., 14:30:15 Apr 15, 2025). It's subject to adjustments by NTP (slewing and stepping), time zone changes, and Daylight Saving Time. Using wall-clock time to measure intervals is unreliable because it can jump forward or backward unexpectedly.
  • Monotonic Clock: This clock measures time from an arbitrary fixed point (like system boot). It is guaranteed to only move forward (monotonically increasing) and is not affected by wall-clock adjustments (NTP, DST, etc.).

Guideline: Applications should always use a monotonic clock whenever they need to reliably measure time durations, intervals, or timeouts.

  • Well-behaved applications use monotonic clock APIs provided by the OS or platform (e.g., CLOCK_MONOTONIC in POSIX/Linux, System.Diagnostics.Stopwatch or Environment.TickCount64 in .NET, time.monotonic() in Python). This avoids issues with wall-clock adjustments entirely.
  • Less robust applications might incorrectly use the wall-clock time for intervals. These are vulnerable to errors (like negative durations) if a clock step occurs. Slewing reduces the risk compared to stepping, but using a monotonic clock is the correct fix.

Measuring time intervals on the JVM

Java provides access to a monotonic clock source via System.nanoTime().

System.nanoTime()

  • Returns the current value of the system's high-resolution timer in nanoseconds.
  • Provides nanosecond precision (accuracy depends on the underlying OS/hardware).
  • Crucially, it is monotonic. Values only increase, based on an arbitrary origin.
  • Not affected by adjustments to the system's wall-clock time.
  • Purpose-built for measuring elapsed time intervals reliably.

Comparison with System.currentTimeMillis()

  • Returns wall-clock time (milliseconds since the Unix epoch).
  • This method is not monotonic. Its value can decrease if the system clock is adjusted backward.
  • Should not be used for measuring time intervals where monotonicity is required.

Java code example for measuring a time interval

final long startTime = System.nanoTime();

// ... perform the operation you want to time ...

final long endTime = System.nanoTime();

final long durationNanos = endTime - startTime;
final double durationSeconds = durationNanos / 1_000_000_000.0;

System.out.println("Operation took: " + durationNanos + " nanoseconds (" + durationSeconds + " seconds)");

Note: we do not care about the actual values returned by System.nanoTime() - there is no meaning to give to a single reading. The value lies in the difference between two readings.

So, after nearly two decades, it's about time I dig into time synchronization. It's a fascinating blend of timestamping, network assumptions, and careful clock management, with important implications for how we write reliable software.

References