2

I have a Unix server that has started rebooting every few minutes. I tried to trace the source of the problem by logging the process tree at the time reboot is called, as described by this question's answer.

However, I don't understand where to look next.

The log contains these lines (among many others):

root         1     0  0 16:49 ?        00:00:00 /sbin/init
root      2894     1  0 16:53 ?        00:00:00 /bin/bash /sbin/shutdown -r now Control-Alt-Delete pressed

To me, it looks like the server's startup process is calling a reboot with shutdown -r. In the system log, all I see is this line:

sshd[2433]: Received signal 15; terminating.

Also, this is an Amazon Web Service Unix instance that only allows connections from my IP address. It's also protected by a private key.

What are the next steps I can take to find the source of the problem?

Anton
  • 121
  • 2
    (not familiar enough with AWS to know, so just guessing): The shutdown reason claims control-alt-del was pressed—it's possible that's how EC2 implements a graceful reboot (or similar) in the management console, by emulating control-alt-del. Are you sure that's not happening? Possibly automatically via some monitoring system? (This is just a generic troubleshooting step: the system says X is happening, first check if X is actually happening...) – derobert Mar 30 '17 at 17:29
  • Good thought, I've posted to the AWS forums to learn more. I haven't restarted the instance through the management console in a while, and I don't think I opted into any special monitoring software unless there's something they force on you even when you launch a barebones instance. – Anton Mar 30 '17 at 17:36
  • 1
    have a look at my question about disabling ctrl-alt-del : http://unix.stackexchange.com/questions/153902/disabling-ctrl-alt-del-and-etc-init (just in case something is seding CAD continously) – Archemar Mar 30 '17 at 18:41
  • This is such awful behaviour that I'd lean towards sabotage as an explanation. There's no legitimate reason to reboot like this. Also, if this was something AWS specific I'd say that this question should really be over in ServerFault but unless you're hooked up to some daffy AWSCloud Watch ecology I don't think that that's the case here. More likely some problem that could happen to any nix system hence my OS oriented answer here. – Nadreck May 18 '17 at 18:49

1 Answers1

0

Well, whatever is doing this is doing it as root so have a look into /var/log/auth.log to see if anyone is signing on as root or admin during this time or using sudo to get root privileges. You might have to increase the logging level in /etc/ssh/ssd_config to get the relevant details.

Other things would be:

  1. Look in /etc/passwd to see which accounts are root or admin and have bash shells defined for their accounts. If they have home directories see if there's anything odd defined in their .bashrc files.
  2. Check out what daemons come up on start up on your box. See this post for details. Any reason why any of those are rebooting?
  3. Sign on as root and check your batch jobs via crontab -e. In your example the reboot comes about 5 minutes into the session. Is anything running from this crontab file at about that interval? Everything in this queue gets root regardless of where its source code comes from.
  4. To eliminate the possiblity of hardware errors you might try starting up an instance of this server in another Amazon Region to make sure this is happening on different hardware.
  5. Was this instance always like this or did it start happening after a while? Do you have any backups (Amazon AMIs of the whole thing so that you can go back to previous versions to see if they still exhibit this behaviour? NB: Can never have enough AMI backups!
Nadreck
  • 402