0

I have an HP Server running SLES 11 SP3 and it sometimes has a bit of a weird issue.

The issue whenever you run a command, doesn't matter the command, .e.g. ps -ef, rcapache2 restart etc it starts the command, .e.g. will output some stuff, but when it finishes, it doesn't actually finish and it gets stuck, so I don't get back to the prompt. If I then press CTRL+C to try and kill it, I get the character outputted to the putty (SSH) session but it still doesn't quit.

I've looked at the HP iLO management interface however, its not reporting any faults.

Thanks for any help you can provide.

strace output of shell

rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
ioctl(0, SNDCTL_TMR_STOP or TCSETSW, {B38400 opost isig icanon echo ...}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigaction(SIGINT, {0x808c610, [], SA_RESTART}, {0xb778e8b0, [], 0}, 8) = 0
rt_sigaction(SIGTERM, {0x1, [], SA_RESTART}, {0x1, [], SA_RESTART}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTART}, {0x1, [], SA_RESTART}, 8) = 0
rt_sigaction(SIGALRM, {0x808c360, [HUP INT ILL TRAP ABRT BUS FPE USR1 SEGV USR2 PIPE ALRM TERM XCPU XFSZ VTALRM SYS], 0}, {0xb778e8b0, [], 0}, 8) = 0
rt_sigaction(SIGTSTP, {0x1, [], SA_RESTART|SA_NODEFER}, {0x1, [], SA_RESTART|SA_NODEFER}, 8) = 0
rt_sigaction(SIGTTOU, {0x1, [], SA_RESTART|SA_NODEFER}, {0x1, [], SA_RESTART|SA_NODEFER}, 8) = 0
rt_sigaction(SIGTTIN, {0x1, [], SA_RESTART|SA_NODEFER}, {0x1, [], SA_RESTART|SA_NODEFER}, 8) = 0
rt_sigaction(SIGWINCH, {0x808c120, [], SA_RESTART}, {0xb778e860, [], SA_RESTART}, 8) = 0
rt_sigaction(SIGINT, {0x808c610, [], SA_RESTART}, {0x808c610, [], SA_RESTART}, 8) = 0
time(NULL)                              = 1433749648
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
pipe([3, 4])                            = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [INT CHLD], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT CHLD], NULL, 8) = 0
pipe([5, 6])                            = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb75bc728) = 4879
setpgid(4879, 4879)                     = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
close(4)                                = 0
close(4)                                = -1 EBADF (Bad file descriptor)
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [CHLD], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb75bc728) = 4880
setpgid(4880, 4879)                     = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
close(3)                                = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0
close(5)                                = 0
close(6)                                = 0
rt_sigprocmask(SIG_SETMASK, [CHLD], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD], 8) = 0

I don't know much about strace but that the bad file descriptor seems to stare out at me but not sure what would be causing this.

Boardy
  • 401
  • What's the history of this system? Has it always had this problem, or did it suddenly start? Also, does it happen if run at the console (not over ssh) as well? – derobert Jun 05 '15 at 19:39
  • If you run it in the background with &, does it run OK? What shell? Do the shell prompt, MAIL variable, or history file use any networked filesystems? – Mark Plotnick Jun 05 '15 at 19:45
  • @derober, it had this problem once before, a few weeks ago, we rebooted the machine and it resolved the issue, and has come back again now. It does it locally as well as over SSH – Boardy Jun 05 '15 at 20:41
  • @MarkPlotnick Oddly, yes if you run with the & it works fine. Its just the default bash shell that comes with SLES, none of those variables or history files use any network shares, everything I was running would be accessing the local machine disks only – Boardy Jun 05 '15 at 20:45
  • 1
    Can you try (in another shell) running strace -p pidofbuggyshell, then run a command and see what system call is running at the moment the command exits? – Mark Plotnick Jun 05 '15 at 20:58
  • Also check the kernel logs (e.g., by running dmesg) for any errors. – derobert Jun 05 '15 at 21:53
  • How do i get the pid of the shell, I know i can get the pid of the process e.g. ps -ef | grep -i app but not sure how to get the shell pid – Boardy Jun 05 '15 at 22:20
  • Run echo $$ in the shell to get its pid. – Mark Plotnick Jun 05 '15 at 23:45
  • I've updated my question to include the strace output – Boardy Jun 08 '15 at 07:50
  • I've also found uhci_hcd 0000:01:00.4: Controller not stopped yet! in the dmesg log, this doesn't seem to be related though, it something to do with virtual mouse and keyboards for the HP iLO interface – Boardy Jun 08 '15 at 07:59

0 Answers0