reboot causes device to shutdown but doesn't start again for many hours after

Question

I am using a BeagleBone running Debian. I have a cron entry in /etc/crontab which tells the device to reboot at 03:01 every night as follows

00 03    * * *   root    /sbin/shutdown -r +1 >> /home/my.log 2>&1

This has worked perfectly for many months. The device shutdown and rebooted at 03:01 as expected.

However now, the device shuts down at 03:01, but the reboot doesn't happen until many hours later. In the case of today, it shut down at 03:01 but didn't actually reboot until 12:35:28, over 9.5 hours later.

With command last reboot , here is the response :

reboot   system boot  4.14.71-ti-r80   Sat Aug 17 12:35   still running
reboot   system boot  4.14.71-ti-r80   Fri Aug 16 08:58 - 03:01  (18:02)
reboot   system boot  4.14.71-ti-r80   Thu Aug 15 07:29 - 03:01  (19:31)
reboot   system boot  4.14.71-ti-r80   Wed Aug 14 04:41 - 03:01  (22:19)
reboot   system boot  4.14.71-ti-r80   Tue Aug 13 08:27 - 03:01  (18:33)
reboot   system boot  4.14.71-ti-r80   Mon Aug 12 09:37 - 03:01  (17:23)

When I check journalctl -b , here is the response :

Aug 17 12:35:28 beaglebone kernel: Booting Linux on physical CPU 0x0
Aug 17 12:35:28 beaglebone kernel: Linux version 4.14.71-ti-r80 (root@b2- 
am57xx-beagle-x15-2gb) (gcc version 6.3.0 20170516 (Debian 6.3.
Aug 17 12:35:28 beaglebone kernel: CPU: ARMv7 Processor [413fc082] revision 
2 (ARMv7), cr=10c5387d
Aug 17 12:35:28 beaglebone kernel: CPU: PIPT / VIPT nonaliasing data cache, 
VIPT aliasing instruction cache
Aug 17 12:35:28 beaglebone kernel: OF: fdt: Machine model: TI AM335x 
BeagleBone Black

What can be causing this issue? I am finding it very hard to diagnose. Thanks

This could be caused by a hanging service or some sort, could you attach a UART console and see what it said when it's trying to reboot? — minhng99, Aug 17 '19 at 13:49
@SandPox Thanks for your reply. I don't actually have physical access to the device. It is deployed somewhere connected to a cellular modem for communication, so I can only reverse SSH into it. When you say a hanging service, do you mean when it is shutting down? — Engineer999, Aug 17 '19 at 13:54
Yes, when you try to reboot manually, does such thing happen? (reboot taking a long time) — minhng99, Aug 17 '19 at 13:57
@SandPox I'm afraid to try it now in case it will go dark again for another 9 hours or so :) I'm thinking to change the command in /etc/crontab to "00 03 * * * root sync && /sbin/reboot >> /home/my.log 2>&1 " Or is there a way I can forcefully stop all services — Engineer999, Aug 17 '19 at 14:00
The sync command won't work because when you shutdown then it already does a sync and I don't think it's because of write cache takes 9hrs to flush. This may be caused by a misconfigured systemd service having an infinite stop timeout or some number that's really high (like 9.5hrs), have a look here: https://unix.stackexchange.com/questions/227017/how-to-change-systemd-service-timeout-value also if you desperately need to reboot, you could try echo 1 > /proc/sys/kernel/sysrq echo b > /proc/sysrq-trigger, be careful with that because it won't sync anything, you'll lose unsaved data — minhng99, Aug 17 '19 at 14:04
@SandPox I've tried adding timeouts to some services. I manually did a reboot also now, and the device didn't come back alive until over 2 hours later. Could this be a hardware issue of some sort? — Engineer999, Aug 17 '19 at 16:26
In my experience, systemd is terrible at rebooting or shutting down a system. almost anything that goes wrong can cause it to wait and wait and wait and retry and retry and retry for anywhere from minutes to hours...or sometimes forever. IMO a stupid design decision because sometimes the reason you want/need to reboot is that something (e.g. an NFS mount) has hung and can't be killed (so waiting for it to die before rebooting is cretinous). yay systemd. — cas, Aug 17 '19 at 16:47
when nothing goes wrong, it works as expected and rebooting/shutting down is fast. it's just terrible at dealing with problems. and the last thing you need when trying to diagnose or fix a problem is for the shutdown process to take ages to get around to rebooting. it couldn't be worse if they decided to go out of their way to maximise downtime. — cas, Aug 17 '19 at 16:50
@cas Is there a way to view the systemd logs to see if it is hanging and why? I'm checking all sorts of logs here, and there is just a "blackout" between the time the device should shut-down and when it boots again. Thanks — Engineer999, Aug 17 '19 at 16:53
maybe. depends on how much (and what) stuff there is to be done after journald is stopped. try journalctl -b -1 to get the logs from the previous boot session. it should end with $date $time $hostname systemd-journald[$pid] Journal stopped. that will at least tell you when journald stopped. and give you a history of what was stopped and when before that. obviously nothing after that will or can be logged. — cas, Aug 17 '19 at 17:03
Now i'm looking at logs in /var/log/syslog when it is trying to shutdown, and I read this.. "beaglebone systemd[1]: apt-daily.timer: Adding 5h 48min 22.614564s random time." — Engineer999, Aug 17 '19 at 17:07
BTW, for this to be of any use, journald has to be configured for persistent storage of logs. see https://unix.stackexchange.com/questions/513212/journald-storage-persistent-just-disk-or-ram-disk — cas, Aug 17 '19 at 17:08
sorry, i have to go now - it's 3am here and i need to sleep. — cas, Aug 17 '19 at 17:09

reboot causes device to shutdown but doesn't start again for many hours after

0 Answers0