2

We have some embedded devices using ntpd(4.2.8p10) to sync the time. One of our customers is using their own ntp server inside an internal network. From the ntpd -dgq debug mode, we found the server is reachable and we can get the offset, delay and jitter info. However, the ntpd will only exit with"ntpd: no servers found" and never select and set the local time.


2 Nov 11:57:05 ntpd[20218]: ntpd 4.2.8p10@1.3728-o Thu Jul 26 19:52:20 UTC 2018 (2): Starting
2 Nov 11:57:05 ntpd[20218]: Command line: ntpd -dgq
2 Nov 11:57:05 ntpd[20218]: proto: precision = 2.000 usec (-19)
Finished Parsing!!
restrict: op 1 addr 0.0.0.0 mask 0.0.0.0 mflags 00000000 flags 000005f0
restrict: op 1 addr 127.0.0.1 mask 255.255.255.255 mflags 00000000 flags 00000000
restrict source template mflags 4000 flags 1c0
restrict: op 1 addr (null) mask (null) mflags 00004000 flags 000001c0
move_fd: estimated max descriptors: 1024, initial socket boundary: 16
2 Nov 11:57:05 ntpd[20218]: Listen and drop on 0 v4wildcard 0.0.0.0:123
2 Nov 11:57:05 ntpd[20218]: Listen normally on 1 lo 127.0.0.1:123
restrict: op 1 addr 127.0.0.1 mask 255.255.255.255 mflags 00003000 flags 00000001
2 Nov 11:57:05 ntpd[20218]: Listen normally on 2 eth1 192.168.168.109:123
restrict: op 1 addr 192.168.168.109 mask 255.255.255.255 mflags 00003000 flags 00000001
2 Nov 11:57:05 ntpd[20218]: Listen normally on 3 wlan0 192.168.100.1:123
restrict: op 1 addr 192.168.100.1 mask 255.255.255.255 mflags 00003000 flags 00000001
2 Nov 11:57:05 ntpd[20218]: Listening on routing socket on fd #27 for interface updates
key_expire: at 0 associd 60163
peer_clear: at 0 next 1 associd 60163 refid INIT
restrict: op 1 addr 10.160.129.161 mask 255.255.255.255 mflags 00004000 flags 000001c0
restrict_source: 10.160.129.161 host restriction added
event at 0 10.160.129.161 8011 81 mobilize assoc 60163
newpeer: 192.168.168.109->10.160.129.161 mode 3 vers 4 poll 6 10 flags 0x101 0x1 ttl 0 key 00000000
event at 0 0.0.0.0 c016 06 restart
peer_xmit: at 1 192.168.168.109->10.160.129.161 mode 3 len 48 xmt 0xe52bde52.ddf3c87c
auth_agekeys: at 1 keys 0 expired 0
event at 1 10.160.129.161 8014 84 reachable
clock_filter: n 1 off 30.082946 del 0.048598 dsp 7.945314 jit 0.000002
peer_xmit: at 3 192.168.168.109->10.160.129.161 mode 3 len 48 xmt 0xe52bde54.ddf0a416
clock_filter: n 2 off 30.083616 del 0.047583 dsp 3.949228 jit 0.000670
peer_xmit: at 5 192.168.168.109->10.160.129.161 mode 3 len 48 xmt 0xe52bde56.dde968ab
clock_filter: n 3 off 30.078398 del 0.054469 dsp 1.951189 jit 0.004895
peer_xmit: at 7 192.168.168.109->10.160.129.161 mode 3 len 48 xmt 0xe52bde58.dde80026
clock_filter: n 4 off 30.079499 del 0.074539 dsp 0.952172 jit 0.003164
peer_xmit: at 9 192.168.168.109->10.160.129.161 mode 3 len 48 xmt 0xe52bde5a.ddea03c8
clock_filter: n 5 off 30.083616 del 0.044472 dsp 0.452664 jit 0.003340
2 Nov 11:57:16 ntpd[20218]: ntpd: no servers found
END OF FILE

Also, when running ntpd in the background and using ntpq -p to query the ntpd status. We get the following result, the st, delay, offset and reach seem fine.

root@S8P20092901:~# ntpq -c as

ind assid status conf reach auth condition last_event cnt

1 59609 9014 yes yes none reject reachable 1

root@S8P20092901:~# ntpq -np remote refid st t when poll reach delay offset jitter ============================================================================== 10.160.129.161 162.159.200.123 4 u 24 64 377 40.404 -180.122 20.122


However, the ntpd never select the ntp server as the time source(never show "*" or "+" before the remote address ) or sets the local time after a long time of waiting.

I looked into the source code. When using ntpdate(-q) mode the ntpd will exit after doing all bursts for every server when there is no clock selected/ set

    } else {
        peer->burst--;
        if (peer->burst == 0) {
        /*
         * If ntpdate mode and the clock has not been
         * set and all peers have completed the burst,
         * we declare a successful failure.
         */
        if (mode_ntpdate) {
            peer_ntpdate--;
            if (peer_ntpdate == 0) {
                msyslog(LOG_NOTICE,
                    "ntpd: no servers found");
                if (!msyslog_term)
                    printf(
                        "ntpd: no servers found\n");
                exit (0);
            }
        }
    }
}


However, I am still not understand why ntpd didn't select and set a time form the server. Thanks for your help in advance.
tj2298
  • 23

2 Answers2

2

This looks like it might be a root dispersion issue (the cumulative error from the time source to your server).

You've provided ntpq -nc associations already:

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 ​59609  9014   yes   yes  none    reject   reachable  1

so now what's required is to show the detail for this problematic association:

ntpq -nc 'readvar 59609'

You should get something like this (taken from my own NTP server)

associd=33428 status=142a reach, sel_candidate, 2 events, sys_peer,
srcadr=90.255.244.219, srcport=123, dstadr=192.168.1.18, dstport=123,
leap=00, stratum=1, precision=-20, rootdelay=0.000, rootdisp=1.511,
refid=PPS, reftime=e53ca0fb.4d946a30  Mon, Nov 15 2021  9:03:55.303,
rec=e53ca11e.bf1413cd  Mon, Nov 15 2021  9:04:30.746, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=10, ppoll=10, headway=0, flash=00 ok,
keyid=0, offset=-0.249, delay=22.177, dispersion=55.975, jitter=56.489,
xleave=0.088,
filtdelay=   157.46  161.45  169.05   22.18   21.68   21.76  186.40   22.04,
filtoffset=   70.21   70.72   74.51   -0.25   -0.03   -0.26   81.90   -0.34,
filtdisp=      0.00   15.39   31.02   47.04   63.23   79.22   86.97   94.76

Look for the rootdisp value. I expect you'll find that yours is high, indicating that there's too much error in the path from the time source to here. There's not a lot you can do about that except use a different upstream server. (You could fix up maxdisp but you have to ask how reliable your upstream server really is, if you have to do this.)

References:

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • Thanks for your information. However, when I waited for a couple of minutes. The ntpd is still not syncing with the server(which didn't show * or + in the front). And the reach already goes to 377. – tj2298 Nov 14 '21 at 21:56
  • thanks for your suggestion. You are right. The rootdisp is really high for our customer ntp server. We have to add "tos maxdist 30" – tj2298 Nov 16 '21 at 04:38
2

I had the same problem which started with this post.

The solution was indeed to add the tos maxdist 30 to /etc/ntp.conf and below I list all the steps for checking and solving it. Do note that this should only be performed in case there is no option for another time server: as stated by others, it also means that the upstream NTP server is not really reliable.

Here are the steps:

If you use ntpd -dgq, you might get an unable to bind to wildcard address error. Hence, before running it, you need to stop the NTP service service ntp stop or kill the processes holding the NTP:

lsof -i | grep ntp
kill <pid>

After that, run the ntpd -dgq command. If you get this final part of the log, then the NTP server is unreachable:

...
...
...
receive: MATCH_ASSOC dispatch: mode 4/server:AM_PROCPKT
filegen  2 3854076120
clock_filter: n 5 off 3.839496 del 0.000455 dsp 0.437525 jit 0.000248
17 Feb 09:42:02 ntpd[1040]: ntpd: no servers found

Also, after restarting the NTP service (service ntp start), the same thing can be visible with commands below - server is reachable but the time sync cannot be performed:

root@akulab1:~# ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 34463  9014   yes   yes  none    reject   reachable  1
root@akulab1:~# ntpq -np
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 172.16.0.25     .LOCL.           1 u   36   64    7    0.579  3917.57   5.842

As stated, the reason is a large rootdisp value of the output below (use assid from ntpq -c as as input readvar):

root@akulab1:~# ntpq -nc 'readvar 34463'
associd=34463 status=9014 conf, reach, sel_reject, 1 event, reachable,
srcadr=172.16.0.25, srcport=123, dstadr=172.16.0.133, dstport=123,
leap=00, stratum=1, precision=-23, rootdelay=0.000, rootdisp=10684.280,
refid=LOCL, reftime=e5b7a483.b4d87c2d  Wed, Feb 16 2022 17:27:47.706,
rec=e5b88b72.a5c64a66  Thu, Feb 17 2022  9:53:06.647, reach=377,
unreach=0, hmode=3, pmode=4, hpoll=6, ppoll=6, headway=290,
flash=400 peer_dist, keyid=0, offset=3934.131, delay=0.516,
dispersion=0.987, jitter=14.653, xleave=0.037,
filtdelay=     0.52    0.52    0.56    0.54    0.54    0.58    0.45    0.49,
filtoffset= 3934.13 3930.84 3927.44 3924.11 3920.90 3917.58 3914.35 3911.63,
filtdisp=      0.00    1.02    2.06    3.08    4.10    5.12    6.11    6.95

Finally, these are the commands to add tos maxdist 30 to /etc/ntp.conf and restart the NTP service:

echo 'tos maxdist 30' >> /etc/ntp.conf
service ntp restart

And, voilà - time is sucessfuly synchronized with your NTP server:

root@akulab1:~# ntpq -c as
ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 60446  961a   yes   yes  none  sys.peer    sys_peer  1
root@akulab1:~# ntpq -np
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*172.16.0.25     .LOCL.           1 u   15   64    1    0.432    0.314   0.171