Case Study · Exchange Infrastructure · Time Synchronization

Precision Time Protocol at exchange scale:
Gemini, $2B+/day, 99.99% uptime

GeminiNov 2020–May 2021PTP / IEEE 1588Solarflare hardware timestamping$2B+ daily volume99.99% uptime

NTP gets you to ±1ms on a well-maintained server. For most systems that's fine. For an exchange's matching, surveillance, and audit infrastructure, it is not. Regulators require verifiable timestamps. Surveillance systems reconstruct market events from logs. Audit trails chain events across multiple systems. When time is wrong at the nanosecond level, those chains break.

Context: Gemini's exchange platform

Gemini is a US-regulated crypto exchange. I worked there as a Senior Site Reliability Engineer from November 2020 to May 2021, on the exchange platform team. The exchange at the time handled over $2B in daily trading volume and operated at 99.99% uptime. The platform included a matching engine, market data distribution, surveillance infrastructure, and audit logging, all of which depend on time.

The PTP implementation was a project to replace NTP-based time synchronization across exchange infrastructure with IEEE 1588 Precision Time Protocol, using Solarflare NICs with hardware timestamping. The goal was sub-microsecond accuracy across the matching, surveillance, and audit layers.

Why NTP wasn't sufficient

NTP gives ±1ms accuracy on a server with a good local stratum-1 source. That sounds small. It isn't, for three specific reasons at exchange scale.

First, regulatory requirements. MiFID II RTS-25 mandates transaction timestamps within 1ms of UTC for high-frequency trading systems. The equivalent for US crypto exchanges under FINRA and SEC surveillance obligations means your timestamps need to be accurate enough that a regulator can reconstruct order precedence from logs. If two orders arrived 100µs apart and your clocks are wrong by more than that, you cannot prove which came first.

Second, surveillance integrity. The surveillance system reconstructs market events from feeds across matching, order routing, and market data. If those feeds have clock skew of more than a few hundred microseconds, reconstructed event sequences become unreliable. This affects both internal audit and any external review.

Third, audit trail linkage. An audit log entry from the matching engine and an audit log entry from the order router need to chain correctly to prove what happened in what order. Millisecond-level skew between the two systems makes that chain ambiguous.

The implementation

PTP (IEEE 1588) works by designating a grandmaster clock and synchronizing all other clocks on the network to it, using hardware timestamps at the NIC layer to eliminate the software jitter that makes NTP imprecise. Solarflare NICs support hardware PTP timestamping. The timestamp is applied at the wire, not at the point where the kernel processes the packet, so the interrupt latency variability that limits NTP accuracy doesn't affect PTP.

The implementation involved: grandmaster clock selection and topology, NIC configuration for hardware timestamping on Solarflare SFN8522 cards, PTP daemon configuration (linuxptp's ptp4l), monitoring for drift and clock offset, and integration with the surveillance and audit systems to ensure they were reading from hardware-timestamped sources.

Key trade-offs and decisions

NTP vs PTP: the accuracy gap at exchange scale

NTP at ±1ms is 1,000× less accurate than PTP at ±1µs on the same hardware. For most services, NTP is the right answer. PTP adds operational complexity and requires specific hardware support. At an exchange where surveillance and audit correctness depends on timestamp ordering, the trade-off inverts. The operational overhead of PTP is justified when the alternative is ambiguous event ordering in regulatory audit trails.

Hardware timestamping as a requirement, not an optimization

Software PTP (timestamping in the kernel or userspace) achieves ±10–50µs under light load and degrades significantly under load or when CPU scheduling interferes. For exchange infrastructure running at capacity during peak volume, software timestamping is unreliable. Hardware timestamping on Solarflare NICs applies the timestamp before any software is involved; accuracy is ±30–100ns regardless of system load.

Grandmaster topology

The grandmaster clock was selected for stability and GPS-disciplined accuracy. The network topology was designed so that boundary clocks at each segment minimized hop count from grandmaster to leaf nodes. Redundancy was built in. A secondary grandmaster configuration prevented the PTP domain from degrading to software timestamping if the primary GPS source failed.

Monitoring time-sync health

A PTP deployment that's "running" is not the same as a PTP deployment that's "working correctly." Continuous monitoring of clock offset (ptp4l's offset from master metric), path delay, and BMCA (Best Master Clock Algorithm) state was integrated into the existing Prometheus/Grafana stack. Alerting was set to trigger before clock offset exceeded the accuracy threshold, not after timestamp errors appeared in audit logs.

What "exchange scale" demanded

The demands of exchange-scale PTP differ from a smaller deployment in three ways: the number of leaf nodes requiring synchronized clocks (matching, order routing, market data, surveillance, logging: dozens of hosts), the operational SLA (any degradation to time sync quality had to be detected and corrected before it affected regulatory audit quality), and the zero-tolerance for the grandmaster being unavailable (exchange operations could not pause while PTP was reconfigured).

Outcomes

±30ns

Clock accuracy achieved vs grandmaster (per ptp4l offset-from-master)

$2B+

Daily trading volume on the exchange platform during the engagement

99.99%

Platform uptime over the engagement period

<50µs

Clock sync accuracy vs MiFID II RTS-25's 1ms requirement

What this means for a client

PTP is one of those infrastructure problems where the gap between "we have it running" and "it's working correctly" is invisible until it isn't. The ±30ns accuracy doesn't show up in your application logs. What shows up, eventually, is an anomaly in a surveillance reconstruction or a timestamp discrepancy in a regulatory audit. By then the operational window to correct it cleanly is gone.

I've implemented this at exchange scale under a real SLA. If you're operating a trading venue or a surveillance-regulated system and your time synchronization hasn't been audited recently, the HFT infrastructure audit covers the full latency path including clock sync.

Need a SRE who's shipped PTP at exchange scale under a regulatory SLA?

HFT infrastructure audit → Other case studies

Precision Time Protocol at exchange scale:Gemini, $2B+/day, 99.99% uptime