I'm working on a system running Debian Wheezy with an application that as part of it's normal behavior spawns a few child processes. Under most circumstances when the parent application is killed it cleans up all all of it's children then exits. However, when I shutdown/reboot the computer frequently hangs for a few minutes before shutting down. The machine is remote/headless so what I see from my end is a message that the system is going down, then my ssh session will get terminated, and then I'll get connection refused errors for the next few minutes when I try to log back in. When I go through /var/log/syslog I'll see entries like:
[ 3840.402493] INFO: task child_process:4455 blocked for more than 120 seconds.
[ 3840.402579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 3840.402684] child_process D c109bbf8 0 4455 3702 0x00000000
[ 3840.402691] f79c7b00 00000046 00000001 c109bbf8 f0bed5b0 333f9aaa 00000351 c1483b00
[ 3840.402703] c1483b00 00000292 c109bcf9 00000292 00000000 f2c005a0 00000000 edd1fd34
[ 3840.402714] 00000292 c109bebf c10abe25 00000001 f2bd4740 00000000 00000001 c109bc0a
[ 3840.402726] Call Trace:
[ 3840.402733] [] ? free_pages_prepare+0xc0/0xf1
[ 3840.402741] [] ? free_hot_cold_page+0x3d/0xf7
[ 3840.402749] [] ? __pagevec_free+0x3e/0x55
[ 3840.402756] [] ? page_address+0x1b/0x85
[ 3840.402763] [] ? free_pages_prepare+0xd2/0xf1
[ 3840.402771] [] ? __mutex_lock_common.isra.6+0x11d/0x132
[ 3840.402778] [] ? mutex_lock+0x15/0x21
[ 3840.402785] [] ? tty_release+0x3d/0x400
[ 3840.402794] [] ? ip_mc_del_src+0xf1/0x12e
[ 3840.402801] [] ? ip_mc_leave_src+0x22/0x68
[ 3840.402809] [] ? kmem_cache_free+0x23/0x55
[ 3840.402816] [] ? fput+0xd7/0x161
[ 3840.402822] [] ? filp_close+0x54/0x5a
[ 3840.402828] [] ? put_files_struct+0x4b/0x88
[ 3840.402834] [] ? do_exit+0x237/0x60d
[ 3840.402842] [] ? recalc_sigpending+0xf/0x2f
[ 3840.402849] [] ? dequeue_signal+0xb4/0x126
[ 3840.402855] [] ? do_group_exit+0x56/0x83
[ 3840.402863] [] ? get_signal_to_deliver+0x43b/0x465
[ 3840.402871] [] ? do_signal+0x32/0x52c
[ 3840.402879] [] ? update_rmtp+0x45/0x45
[ 3840.402886] [] ? do_futex+0x99/0x6c6
[ 3840.402893] [] ? read_tsc+0xa/0x28
[ 3840.402901] [] ? timekeeping_get_ns+0x10/0x47
[ 3840.402907] [] ? sys_futex+0xbb/0x10f
[ 3840.402915] [] ? do_notify_resume+0x1e/0x5c
[ 3840.402922] [] ? work_notifysig+0x13/0x1b
[ 3840.402929] [] ? start_cpu_timer+0x39/0x62
[ 3840.402935] [] ? vmstat_cpuup_callback+0x18/0x5a
For a number of the child processes after the reboot request was issued but before the system actually goes down.
Something that may be related is that occasionally the parent application leaves a child running after it exits. In order to facilitate manually killing these processes I have a script that basically consists of:
#!/bin/bash
killall -q parent_process
killall -q child_1
killall -q child_2
#etc...
When run manually this script dependably kills all of the processes and never hangs. Additionally none of the child processes ever hang/exhibit this behavior if they are started manually.
My gut tells me that the issue is that the parent application isn't cleaning up it's children properly, but unfortunately the parent process is a proprietary in house application that I'm not able to change the implementation of, so I'm stuck trying to find a workaround. If I run my killall script manually before shutdown/reboot the shutdown is never delayed, so I've been trying to find a way to get that script to run at shutdown/reboot
My first attempt was to just copy my killall script into /etc/init.d then link it into rc0.d and rc6.d as K01killstuff, but that didn't work, so I decided to try to write a proper init script. I started from the skeleton in /etc/init.d and came up with:
#! /bin/sh
### BEGIN INIT INFO
# Provides: killstuff
# Required-Start: $remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# Default-Stop: 0 1 6
# Short-Description: Stop applications
# Description: This file should be used to construct scripts to be
# placed in /etc/init.d.
### END INIT INFO
# Author:
#
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="Description of the service"
NAME=daemonexecutablename
DAEMON=/usr/sbin/$NAME
DAEMON_ARGS="--options args"
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.2-14) to ensure that this file is present
# and status_of_proc is working.
. /lib/lsb/init-functions
#
# Function that stops the daemon/service
#
do_stop()
{
killall -q parent_process
killall -q child_1
killall -q child_2
}
case "$1" in
start)
;;
stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping Applications"
do_stop
;;
status)
;;
restart|force-reload)
;;
*)
#echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
echo "Usage: $SCRIPTNAME {start|stop|status|restart|force-reload}" >&2
exit 3
;;
esac
:
I tested the script with /etc/init.d/killstuff stop
and confirmed that it did kill all of the processes I expected it to, then ran update-rc.d killstuff defaults
, and saw that links were created in rc0.d and rc6.d but the system still hung when rebooting. Which leads me to believe that these processes need to be killed earlier in the shutdown sequence
So what are some other places I could try to invoke my kill script? Or are there things I could try other than killing all of our applications before the system shuts down to try to make things close more smoothly?
K01killstuff
you could try manually removing and recreating them with nameK00killstuff
and the script should be the first to be executed. – meuh Aug 14 '16 at 09:46/etc/init.d/skeleton
is no more. – JdeBP Nov 10 '18 at 04:54