4

Problem

I would like to kill a process called raspivid (program which records videos using a Raspberry Pi Camera) but I cannot...

This is how I call it:

#!/bin/bash

#Start recording...
raspivid -w 800 -h 600 -t 15000 -o $1 -v -n -rot 270 >> /home/pi/log/camera_output.txt 2>&1 &

#Waiting the video to be complete
sleep 16

#Killing child process
sudo kill -9 $!

#Killing parent process
sudo kill -9 $$

If I search for this process, it is still there:

pi@raspberrypi ~ $ ps -ef | grep raspivid
root      7238     7234  0 21:53 ?        00:00:00 [raspivid]
pi       17096 14925  0 22:05 pts/0    00:00:00 grep --color=auto raspivid

If I try to kill it, it doesn't die. Instead it changes the parent PID to 1:

pi@raspberrypi ~ $ sudo killall raspivid
pi@raspberrypi ~ $ ps -ef | grep raspivid
root      7238     1  0 21:53 ?        00:00:00 [raspivid]
pi       17196 14925  0 22:05 pts/0    00:00:00 grep --color=auto raspivid
pi@raspberrypi ~ $ sudo killall raspivid

Observations:

  1. The call works fine for a while (2 hours or something) then it starts hanging.
  2. Only a physical power off solves the issue. I cannot reboot via terminal (it hangs too)

My questions:

  1. Why does Linux assign the parent PID to 1?
  2. Why the process cannot get killed? (I also tried sudo kill -9 7238)

EDIT:

aecolley was right. The column S shows D:

0 D     0 11823 11819  0  80   0 -     0 down   ?        00:00:00 raspivid
  • 2
    Probably it's a zombie process. Check with top how many zombies do you have or please provide which flags (STAT) this process has (if it has Z, it's zombie). E.g. by ps wuax PID. – kenorb Feb 04 '15 at 21:34
  • 3
    @kenorb, no, zombies are usually have (defunct) suffix. But square braces give a clue - it may be a kernel thread – myaut Feb 04 '15 at 21:38
  • It might still be hanging on to the device. – umeboshi Feb 04 '15 at 21:46
  • 2
    @myaut On my machines, the suffix shown by ps for zombies is actually <defunct> (with angle brackets). – vinc17 Feb 04 '15 at 21:48
  • @vinc17, yep, this is Solaris notation i got confused with Linux. – myaut Feb 04 '15 at 21:49
  • Could you please provide output of cat /proc/7238/stack. It will show what process is doing now. – myaut Feb 04 '15 at 21:51
  • What is the purpose of sending the process to the background, sleeping for sixteen seconds, then sending a kill signal? Why aren't you leaving the process in the foreground? Also, if you do need it in the background, why are you sending SIGKILL without attempting a SIGTERM first? – umeboshi Feb 04 '15 at 22:04
  • @umeboshi & sleep 16; kill $! implements a 16-second timeout. What makes no sense, but doesn't hurt here, is calling sudo and using kill $$ rather than exit to terminate the shell script. – Gilles 'SO- stop being evil' Feb 04 '15 at 23:01
  • @Gilles, I figured the -t15000 would execute the command for 15 seconds. I just don't understand why it's placed in the background, then forcibly killed one second after it's supposed to be complete. It just makes me think that there is another problem that's being masked. – umeboshi Feb 05 '15 at 11:24
  • @umeboshi The raspivid command works fine for a while as stated in the question. For an unknown reason it suddenly stops working and cannot be killed. Therefore I sent the Raspivid command in the background and waited 16 exactly to try killing it. –  Feb 05 '15 at 14:00
  • @Gilles Thanks for the comments. I replaced it for exit (However that doesn't solve my problem though) –  Feb 05 '15 at 14:02
  • @user1688175, that's what I figured. I think the process is still hanging on to the device. Aecolley's answer below is right on track. Be aware that you could have buggy hardware, which is sometimes difficult to determine without extra hardware to test. – umeboshi Feb 05 '15 at 15:53

1 Answers1

11

If you run ps -el instead of ps -ef, you'll get an S column with the process state. My guess is that the process is in state D, which means uninterruptible wait.

In other words, the process is stuck in the messier parts of a device driver, and the kernel doesn't think it's safe to kill it until the device driver lets go of it. You sometimes see this with processes that talk to sick NFS servers, or devices with errors. In this case, it looks like it's talking to a video-capture device.

Unfortunately, there's no silver-bullet way to unstick a process from D-wait, except for rebooting the system. You could try using the Solaris command truss to find out what the program did right before it got stuck, but there might not be anything you can do about it. You may just have a buggy device driver.

Finally, the reason the parent pid changes to 1 is that your killall is successfully killing the parent process. Whenever a process exits, its child processes are all inherited by pid 1. It's a minor mystery why the ps -f line for the parent process isn't matched by the grep.

aecolley
  • 2,177
  • 2
    Question is about Linux, so strace is Linux equivalent to truss on Unix. – kenorb Feb 05 '15 at 11:41
  • @aecolley Thank you. That is exactly my issue. It prevents me to reboot the machine. Whatever reboot command I try it simply doesn't work. –  Feb 05 '15 at 14:04
  • @user1688175 If /sbin/reboot -f doesn't reboot, then it's definitely a buggy device driver. – aecolley Feb 06 '15 at 22:03
  • aecolley see my changes in the questions –  Feb 07 '15 at 05:32