3

I have this busybox/linux box that I need to maintain periodically. To do that, I issue a ps command programmatically and check the running processes.

But as shown in the screenshot, sometimes, the ps command does not terminate and return to the prompt, it just stays there, and so my application cannot proceed.

It's also obvious that previous ps commands are still running while the current one (i.e. which's output is shown in the screenshot) is also hanged.

Does anyone know what could be the problem and how to fix it ?

ps command hanged

Muhammad Gelbana
  • 1,643
  • 8
  • 20
  • 26
  • 1
    Have you checked log files? Could you try e.g.: strace -vf ps aux 2>&1 | tee /tmp/ps.strace. And take a look at that file. – Runium Jun 20 '13 at 11:25
  • Thank you. But that box has a customly built Linux and the commands (strace & aux) doesn't exist. While the tee commands causes the same behavior. I tried copying the ps.strace file locally to check it on a windows machine but it only contained the following line: -sh: strace: command not found – Muhammad Gelbana Jun 20 '13 at 11:42
  • Too bad. (aux wasn't important, only a reflex on my part.). Is the system built on some distro as base (the Linux)? What is the version of ps? ps --version or the like. Does lsof give you anything to work with? E.g. lsof -p <PID_OF_HUNG_PS> | grep proc. And again; anything in log files? Looks like you have root access, could you install strace? – Runium Jun 20 '13 at 12:20
  • 1
    … and is that two attempts on reboot also hanging there (PID 2398 and 2471)? If it is I guess there is more then ps that has trouble. Log-files should hopefully give some indication on what is going on. – Runium Jun 20 '13 at 12:30
  • I don't know the distro but uname -a outputs: Linux HST-R1 2.6.25.4 #1 Mon Oct 26 15:28:50 EDT 2009 ppc unknown I also remember we are using busybox. lsof isn't installed and ps doesn't support the --version option. I'm sorry I'm not very good with linux. What other logs can I inspect ? Thanks a lot for your help. – Muhammad Gelbana Jun 20 '13 at 12:52
  • Do not know HST, but that is an fairly old kernel version. You should also get open files of process by ls -la /proc/<PID_OF_PS>/fd/ – (Only wondering if it is stuck on a specific file). For other information in /proc/ look at PROC(5). As for log files look in /var/log/. As it is an custom build you have to poke around, but you should have something like messages, dmesg, kern.log etc. Look e.g. here: Some of the logfiles. Do you have sysctl? sysctl fs.file-nr. – Runium Jun 20 '13 at 13:14

3 Answers3

6

D in the fourth column means that the process is currently engaged in a system call. This system state normally lasts for a very short time, so it's unusual to observe it. Observing D tends to indicate either a slow I/O (e.g. on a network filesystem) — which ps isn't doing — or something wrong in the kernel or with the hardware.

Either you're observing a kernel bug, or some part of your hardware is failing. To know more, the first step is in finding log files. Your system is running BusyBox and your screenshot shows that it's running syslogd with no parameters so all the system logs are in /var/log/messages. There's a good chance that this file contains some indication of what's going wrong. If you need help interpreting the logs, edit them into your question (or put them up online somewhere if they're too large).

0

Today have seen the same kind of issue. the issue is the n/w latency. identified via ping command.

64 bytes from 172.16.60.143: icmp_seq=1 ttl=64 time=300 ms
64 bytes from 172.16.60.143: icmp_seq=2 ttl=64 time=293 ms
64 bytes from 172.16.60.143: icmp_seq=3 ttl=64 time=258 ms
64 bytes from 172.16.60.143: icmp_seq=4 ttl=64 time=268 ms
64 bytes from 172.16.60.143: icmp_seq=5 ttl=64 time=250 ms
64 bytes from 172.16.60.143: icmp_seq=6 ttl=64 time=282 ms
64 bytes from 172.16.60.143: icmp_seq=7 ttl=64 time=233 ms
64 bytes from 172.16.60.143: icmp_seq=8 ttl=64 time=259 ms
64 bytes from 172.16.60.143: icmp_seq=9 ttl=64 time=288 ms
64 bytes from 172.16.60.143: icmp_seq=10 ttl=64 time=240 ms

NOTE: If you are connecting servers via VPN. try to log in different ISP solve this issue.

  • 1
    The shown ps output indicates that there are other issues with the system than network latency. There are several ps processes hanging as well as two reboot commands. This likely does not have to do with network issues. – Kusalananda Jul 24 '19 at 07:37
0

I have a couple of suggestions. Let's just not worry about the why it happens but the how to run ps without an issue occurring.

Here are a few suggestions that would be easy to try given your level of experience with linux/unix:

  1. Try running ps auxwwf > myprocesses && cat myprocesses. This will give you a list of all running processes with detailed information and if the first command is successful you will be able to read the contents of myprocesses.

  2. Don't use ps by itself. There are many different variations but my personal favorite is auxwwf. Another good one is ps lax.

  3. Try sending the process to the background by doing: ps auxwwf & Then you can type jobs and see the PID of the ps process. You can then bring it up by typing fg. To send your ps command to the background again you can press Ctrlz. This way, you control ps. If it hangs you can open another terminal session and kill the PID associated with it.

Let me know if those work for you. They are just a few things you can try as a work around. I'm sure someone will have other ways.

pullsumo
  • 382
  • Thank you, but ps available options is w for "wide output" only> – Muhammad Gelbana Jun 20 '13 at 13:51
  • And you can't sudo? does "cat /etc/*release" give you anything? – pullsumo Jun 20 '13 at 14:22
  • This system is running Busybox, not some desktop/server distribution. There is no ps ax or /etc/*release or sudo on such systems. Not that they would help — what are 1 and 2 supposed to work around? Options to ps wouldn't make any difference. 3 lets you continue using the shell but that won't affect the stuck processes. – Gilles 'SO- stop being evil' Jun 20 '13 at 17:29
  • @Gilles I was just trying to be helpful and offer some advice. I've never heard of busybox. You should be more constructive in your criticism and try to contribute to the conversation in a positive manner. – pullsumo Jun 20 '13 at 21:31
  • 2
    Mister - Nothing in Gilles' answer was a criticism as far as I can tell. – Rory Alsop Jun 21 '13 at 12:48