⚡ Frank Villaro-Dixon's website

My (slightly) overkill NTP infrastructure

finished

Context

When I was working at the amazing swiss 🇨🇭 hoster Infomaniak, I had the change to setup their NTP infrastructure.

Previously, they may have been using an old, single point of failure, forgotten, server that acted as the main time provider. When you have thousands of clients and a time-sensitive infrastructure, and multiple DCs this in not really acceptable.

So the goal was to create something better, and we ended up using some PCI-e cards with an OCXO and a GPS + PPS connection.

These servers are now part of the ntp pool and provide thousand of requests per second.

Can I haz too?

Well, this is great but I no longer work with them and I kinda want the same stuff to play with

Stratum 1

My first goal was to have a stratum 1 server. Of course, I could buy some Symmetricon appliance on ebay, but then it’s not the same.

I ended up choosing an old raspberry pi (RPi 3 model B 1.2). Fitted with an Ublox MAX-M8Q GPS GPS hat, it provides an accurate 1Hz “PPS” signal to the pi.

To have a good reception, the GPS antenna was mounted on the roof, a bit far from my 21km P2P WiFi link in order to avoid interference.

NTP Network infrastructure

This is nice, but it needs an enclosure. I salvaged an old cisco router I had lying around and made this:

NTP server display

The LCD is connected to the Pi with a Pcf8574 I2C multiplexer and driven with a small rust program.

Redundancy & BGP

What happens if the Rasbperry pi dies? The NTP pool is redundant, but my quality of service would suffer.

I decided to add two other hosts. As I have two different sites, they would be geografically redundant.

The two stratum 2 servers are simple VMs (1CPU, 512Mb of RAM. Turns out ou dont need much to tell the time). They are created with terraform on my proxmox hypervisors.

VM on proxmox

Instead of adding their interal (though public) IPv6 on the NTP pool, went the BGP route.

Each server has a decicated IP their offer via BGP. For my servers, the IPs are:

  • 2a01:e0a:431:b527::a123 (chronos.ntp.k3s.fr)
  • 2a01:e0a:431:b527::b123 (ntp-s2-cra.ntp.k3s.fr)
  • 2a0e:e701:122c:fff0::a123 (ntp-s2-ces.ntp.k3s.fr)

The servers directly peer with my routers.

To make it redundant, each server will also announce the other IPs. However, if we simply announced all the IPs on each server it would be hard to have balanced and dedicated traffic. Indeed, If the routes and not identical, the traffic would always go to the shortest server maybe overloading them.

To fix that, the redundant IPs are announced with some AS Path prepending. We artifically lenghten the route for these “backup” annouces so the traffic naturally goes to the intended server.

The way it’s done is quite simple: each server has 2 dummy network interfaces: bgp and bgp-backup. For example, on chronos, we have:

4: bgp: <BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state UNKNOWN …
    inet6 2a01:e0a:431:b527::a123/128 scope global
       valid_lft forever preferred_lft forever
5: bgp-backup: <BROADCAST,NOARP,UP> mtu 1500 qdisc noqueue state UNKNOWN …
    inet6 2a01:e0a:431:b527::b123/128 scope global
       valid_lft forever preferred_lft forever
    inet6 2a0e:e701:122c:fff0::a123/128 scope global
       valid_lft forever preferred_lft forever

We then configure our BGP router (in my case frr), to announce the IPs on the interfaces:

router bgp 64600
 bgp router-id 192.168.10.155
 bgp bestpath as-path multipath-relax
 bgp bestpath compare-routerid
 neighbor pg-leaf peer-group
 neighbor pg-leaf remote-as external
 neighbor fc00::1 peer-group pg-leaf

 address-family ipv6 unicast
  redistribute connected route-map map-bgp
  neighbor pg-leaf activate
  neighbor pg-leaf soft-reconfiguration inbound
  neighbor pg-leaf route-map map-bgp out

route-map map-bgp permit 10
 match interface bgp

route-map map-bgp permit 20
 match interface bgp-backup
 set as-path prepend 64600 64600 64600

This way, the IPs naturally go to each dedicated host:

$ show ipv6 route
…
B      2a01:e0a:431:b527::a123/128 [20/0] via fe80::ba27:ebff:fe72:7731, eth0, 4d8h15m
B      2a01:e0a:431:b527::b123/128 [20/0] via fe80::be24:11ff:fe68:534e, eth0.21, 16d23h42m
…

But when a server dies, the BGP connection is broken and the traffic is redirected to the other backup hosts:

$ show ipv6 route
…
B      2a01:e0a:431:b527::a123/128 [20/0] via fe80::be24:11ff:fe68:534e, eth0.21, 1m20s
B      2a01:e0a:431:b527::b123/128 [20/0] via fe80::be24:11ff:fe68:534e, eth0.21, 16d23h42m
…

The actual network schema feels like that:

XXX

As you may notice, the actual SPOF is the internet uplink. Sadly, I actually have two different ISPs, so I can’t really announce ISP’s A range on ISP B. It should be fixed once I have my own ASN :-)

Choosing NTP peers

Choosing NTP peers is tricky. You may be tempted to use the NTP pool for that, but it is not great if the server is part of the pool itself, and it could cause the overall time to drift if everybody did that.

The challenge is then to find NTP stratum 1 peers. Of course, I added my Pi in the list, but one server is not enough as it may drift if the source becomes unavailable. You can see an example of a drifting peer in the following picture:

Drifting NTP peer

In order to find the peers, I extracted all the NTP pool participant servers and directly queried each of them. As an NTP reply contains the stratum of a server, I could cherry pick the stratum-1 servers close to me.

Future upgrades

RPi

The raspberry pi ethernet connection goes through an internal USB connection. This is not that great because latency is higher due to the USB protocol. I’d like to try a beaglebone because their ethernet connector talks directly with the CPU.

Also, I’d like to add an OCXO to the setup in case the GPS connection is lost, due to interferrence, jamming, or a bug. However, this is more complicated because it’d need to have a full PLL + VCO stack in the middle of the GPS PPS->Pi route. Maybe someday..

And of course, get an ASN, announce the “bgp HA” prefix on both upstreams, and fully be HA :-)

IPv4

I originally made this infrastructure IPv6 only, because IPv4 is dying, but mostly because I don’t have many IPv4s, and that would involve NAT, which is suboptimal. Maybe oneday :-)

Use the pool

Your best bet would be to use the NTP pool, but if you want to use the stratum 1 server (hosted near Geneva) as an uplink, you can use chronos.ntp.k3s.fr (warning: IPv6 only for now!)

If you’re curious, you can access the Grafana dashboard here or the NTP pool stats page.

Source code & Dashboard

You can find the source code of the project here