Crazy clock drift on ancient VM

Question

I am stuck with a Centos 5.3 VM (running on Proxmox) which exhibits huge clock drift. It was configured to run ntpdate every 5 minutes but the clock was still getting out of sync by up to 20 seconds between executions. I've tried running ntpd (and stopping the cron job) but its not reporting any errors / I can't see an ntp.drift file getting created anywhere and the clock is continuing to drift.

I'm running approximately 30 VMs and the same number of containers on the cluster - nothing else exhibits the same issue. Apart from the server address there is not other configuration in /etc/ntp.conf

Chris Davies · Accepted Answer · 2021-10-14T16:12:01.973

The kernel's idea of the time adjustment needs fixing. The ntpd process usually disciplines that, so as time becomes closer to reality the rate of change is reduced. It may have been the result of an interaction arising from your attempt to fix the time stepping with ntpdate.

What I'd suggest is that you make sure you know whether you're using systemd's time synchronisation, ntpd, or a bodge built around something like ntpdate.

Turn them all off
Move /etc/adjtime out of the way (it would be interesting to see its contents in your question)
Immediately reboot

On a well synchronised machine here I have the following values in /etc/adjtime, and the file itself was last modified back in February. DO NOT COPY THESE VALUES

0.001341 1613401384 0.000000
1613401384
UTC

Looking at man 5 adjtime you can see that these values show this system has a systematic drift of 0.001341 seconds/day (two years for a one second drift), and that it was last adjusted in February. Pretty stable. Oh, and the system clock is correctly running as UTC.

Other things to consider in your situation

Is the kernel trying to get its date/time from the VM Hypervisor?
If so, any use of a time synchronisation tool on the VM will upset the time keeping
Is the VM being put to sleep or slowed down during its periods on inactivity? This can upset the wallclock if it's not being synchronised from the VM Hypervisor

There are a couple of useful references which explain (among other things) that CentOS 5.3 didn't have the kvm-clock kernel module,

Without this module, when the Hypervisor temporarily (and correctly) starves the VM of CPU resource, the VM's clock can slow down or even stop. The module keeps the VM kernel's time accurate.

You can check your own situation with this command, which should report kvm-clock:

cat /sys/devices/system/clocksource/clocksource0/current_clocksource

The first of those two references also suggested updating to the latest CentOS 5.3 (it was relevant at the time) as this fixed their time problems. Probably not realistic for the year 2021 though.

Another source suggests adding divider=10 clocksource=acpi_pm to the kernel boot line, but that's for VMware and may not be applicable to Proxmox with kvm.

Centos 5.3 was release c. 2008. Not systemd :(. /etc/adjtime hasn't been touched since July and contains "0.003548 1625675037 0.000000 1625675037 UTC " The actual drift appears to be in the region of 1750 seconds/day — symcbean, Oct 14 '21 at 15:34
Deleting /etc/adjtime + reboot seems to have cleared the issue. Ta — symcbean, Oct 15 '21 at 13:44

Crazy clock drift on ancient VM

1 Answers1