2

While looking up how to use gettimeofday() for the umpteenth time I decided this time to do a quick dive on vDSO since I only had a vague awareness of it and wondered if there were any usage gotchas I should look out for.

According to https://stackoverflow.com/questions/42622427/gettimeofday-not-using-vdso, if the vDSO is in use, strace should never show gettimeofday or clock_gettime.

Well, looks like my ThinkPad T400 has been broken for some time: I've always seen *tons* of these calls in strace for as long as I can remember. (Particularly from QEMU.)

If I try testgtod.c (which runs gettimeofday() 1000 times) from the above question:

$ strace ./testgtod 2>&1 | grep clock_gettime | wc -l
1000

Currently, the only difference I can find between my ThinkPad and my i3 desktop is that the i3 is using the TSC, while the ThinkPad is using the HPET because tsc: Marking TSC unstable due to TSC halts in idle. (Wondered if this might be a suspend/resume thing, but then noticed the timestamp - this is 1.53 seconds into bootup.) The T400 is (currently...) running Arch, while the i3 box is running Debian 9.

The above question also made reference to dump-vdso.c. The vDSO on the T400 looks pretty good to me:

$ objdump -T vdso.so

vdso.so:     file format elf64-x86-64

DYNAMIC SYMBOL TABLE:
0000000000000740  w   DF .text  000000000000005e  LINUX_2.6   clock_gettime
00000000000007a0 g    DF .text  0000000000000067  LINUX_2.6   __vdso_gettimeofday
00000000000007a0  w   DF .text  0000000000000067  LINUX_2.6   gettimeofday
0000000000000810 g    DF .text  0000000000000010  LINUX_2.6   __vdso_time
0000000000000810  w   DF .text  0000000000000010  LINUX_2.6   time
0000000000000740 g    DF .text  000000000000005e  LINUX_2.6   __vdso_clock_gettime
0000000000000000 g    DO *ABS*  0000000000000000  LINUX_2.6   LINUX_2.6
0000000000000820 g    DF .text  0000000000000025  LINUX_2.6   __vdso_getcpu
0000000000000820  w   DF .text  0000000000000025  LINUX_2.6   getcpu

One other link I found, https://bert-hubert.blogspot.com/2017/03/on-linux-vdso-and-clockgettime.html, says that the the vDSO code lacks support for certain timers and will fallback to a syscall if you use one of those. That article is from 2017 and the details in https://lore.kernel.org/linux-arm-kernel/20190621095252.32307-1-vincenzo.frascino@arm.com/ (June 2019) suggests that almost all (if not all?) timers have vDSO support now, but in any case, the testgtod program noted above called CLOCK_REALTIME, which the 2017 article says was vDSO-supported back then.

So: I'm officially confused :)

Reading through http://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/, I see a lot of references to the TSC. The article doesn't really mention it, but I'm starting to think that maybe RDTSC{,P} is an unprivileged instruction that can be called from userspace, while reading from the HPET requires kernel-level access (to hardware or timer values). Which would totally explain the syscall fallback.

Incidentally the Core2 P8600 in my T400 does support tsc and constant_tsc, but not nonstop_tsc.

None of existed, if anyone with more reputation wants to add one or more of them.

i336_
  • 1,017

1 Answers1

5

The article doesn't really mention it, but I'm starting to think that maybe RDTSC{,P} is an unprivileged instruction that can be called from userspace, while reading from the HPET requires kernel-level access (to hardware or timer values). Which would totally explain the syscall fallback.

This is the reason.

You can verify the change in vDSO behaviour between the two clocks on any system which supports both tsc and hpet as a time source:

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc hpet acpi_pm

$ echo tsc | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource > /dev/null

$ strace -e clock_gettime date
Sun 24 Nov 10:49:49 CET 2019
+++ exited with 0 +++

$ echo hpet | sudo tee /sys/devices/system/clocksource/clocksource0/current_clocksource > /dev/null

$ strace -e clock_gettime date
clock_gettime(CLOCK_REALTIME, {tv_sec=1574589034, tv_nsec=589851883}) = 0
Sun 24 Nov 10:50:34 CET 2019
+++ exited with 0 +++

(Remember to restore the original clock source.)

RDTSC is an unprivileged instruction, and you can see an example of its use in the GCC manual: search for rdtsc there, compile the example code, and you’ll see you can run it in userspace. (Strictly speaking, RDTSC and RDTSCP can be privileged, they aren’t by default under Linux, but they can be made privileged using prctl.)

In the vDSO, clock_gettimeofday and related functions are reliant on specific clock modes; see __arch_get_hw_counter. If the clock mode is VCLOCK_TSC, the time is read without a syscall, using RDTSC; if it’s VCLOCK_PVCLOCK or VCLOCK_HVCLOCK, it’s read from a specific page to retrieve the information from the hypervisor. HPET doesn’t declare a clock mode, so it ends up with the default VCLOCK_NONE, and the vDSO issues a system call to retrieve the time.

The patchset you linked to wasn’t about unifying clock handling across clocks, but unifying it across architectures. There are still a few clocks with no vDSO support, HPET and ACPI among them.

Stephen Kitt
  • 434,908