2

We have observed in our solaris environment that, once in while the process we run hangs indefinite with out able to be killed. The only way possible is to reboot the server. Even kill -9 -1 also did not work out. Kill with Sending signals 15 (SIGTERM), 1 (SIGHUP), 2 (SIGINT) did not work.

The process is not consuming any CPU memory nor disturbing the execution of the other processes.

How can such a process be killed?

Few logs which might be useful:

10:03 tsstool@selilsx592[119]/proj/2gsim/usr/qzbiwis/ATE/hanging_process/selilsx592> /usr/ucb/ps -alxwww | grep -i sea
0 50167  2673     1  0  59 201115264361720 6003fa6b7ea S ?         0:44 /proj/lmdoste/tools/steroot/apps/sea/sea_R38A/lib/sea/bin/64bit/SEA -l INIT.sco,AXEMANAGER.sco,SEAGUI.sco,RTS.sco,SCHR.sco -p 5001 -e node-name
0 50167 13107 13068  0  59 20 4288 3408 3000b6f7366 S ?         0:00 tcsh -c cd "/proj/2gsim/Tools/.jenkins/SEA_selilsx592" && /app/jdk/1.7.0_55/bin/java  -jar slave.jar
0 50167 11815 11810  0  59 20 3720 3072 3001021e55c S pts/1     0:00 bash /proj/lmdoste/tools/steroot/apps/toolbox/bscste_toolbox -n /proj/2gsim/usr/ate/hibiscus_tmp//sea_network/1504762574024/test.toolbox -r /proj/lmdoste/tools/steroot -s --tcctrl AUTO -d 2
0 50167 11830 11815  0  59 20 3720 2504 60031e9e138 S pts/1     0:00 bash /proj/lmdoste/tools/steroot/apps/toolbox/bscste_toolbox -n /proj/2gsim/usr/ate/hibiscus_tmp//sea_network/1504762574024/test.toolbox -r /proj/lmdoste/tools/steroot -s --tcctrl AUTO -d 2
0 50167 19256 13040  0  49 20 1768 1408 600278eebbc S pts/2     0:00 grep -i sea

10:08 tsstool@selilsx592[133]/proj/2gsim/usr/qzbiwis/ATE/hanging_process/selilsx592> pflags 2673 
2673:   /proj/lmdoste/tools/steroot/apps/sea/sea_R38A/lib/sea/bin/64bit/SEA -l
        data model = _LP64  flags = ORPHAN|RLC|MSACCT|MSFORK
        sigpend = 0x00004001,0x00000000
/1:    flags = DSTOP
        sigmask = 0x00000004,0x00000000

10:08 tsstool@selilsx592[131]/proj/2gsim/usr/qzbiwis/ATE/hanging_process/selilsx592> ps -l 2673 
usage: ps [ -aAdeflcjLPyZ ] [ -o format ] [ -t termlist ]
        [ -u userlist ] [ -U userlist ] [ -G grouplist ]
        [ -p proclist ] [ -g pgrplist ] [ -s sidlist ] [ -z zonelist ]
  'format' is one or more of:
        user ruser group rgroup uid ruid gid rgid pid ppid pgid sid taskid ctid
        pri opri pcpu pmem vsz rss osz nice class time etime stime zone zoneid
        f s c lwp nlwp psr tty addr wchan fname comm args projid project pset

10:04 tsstool@selilsx592[126]/proj/2gsim/usr/qzbiwis/ATE/hanging_process/selilsx592> pstack 2673
pstack: cannot examine 2673: process is traced

last pid: 19355;  load averages:  0.02,  0.02,  0.02                                                                                                                                    10:07:10
71 processes:  70 sleeping, 1 on cpu
CPU states:     % idle,     % user,     % kernel,     % iowait,     % swap
Memory: 16G real, 12G free, 1470M swap in use, 17G swap free

   PID USERNAME THR PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
19353 tsstool    1  38    0 3256K 2368K cpu/3    0:00  0.08% top
13120 tsstool   23  59    0  163M   88M sleep    6:12  0.04% java
   363 root       1  59    0   48M   46M sleep   29:22  0.01% .vasd
   362 root       1  59    0   32M   28M sleep   81:25  0.01% .vasd
   361 root       1  59    0   34M   29M sleep   27:56  0.01% .vasd
13068 tsstool    1  59    0   11M 5712K sleep    0:04  0.01% sshd
   365 root       1  59    0   14M   11M sleep    2:27  0.00% .vasd
   555 root       2  59    0 7600K 4616K sleep   49:50  0.00% automountd
   355 root       1  59    0   15M   12M sleep    5:19  0.00% .vasd
   181 root      31  59    0 9256K 7128K sleep    3:22  0.00% nscd
   452 daemon     1  59    0 8712K 4536K sleep    2:37  0.00% .vasypd
   874 op5nrpe    1  59    0 5528K 2000K sleep    1:16  0.00% nrpe
   515 root       1 100  -20 3232K 2096K sleep    0:52  0.00% xntpd
     1 root       1  59    0 3128K 2272K sleep    0:46  0.00% init
  2673 tsstool    1  59    0 1089M  353M sleep    0:44  0.00% SEA

10:04 tsstool@selilsx592[124]/proj/2gsim/usr/qzbiwis/ATE/hanging_process/selilsx592> psig -n 2673 
2673:   /proj/lmdoste/tools/steroot/apps/sea/sea_R38A/lib/sea/bin/64bit/SEA -l
HUP     caught  0xffffffff4004a500      RESTART
INT     caught  0xffffffff4004a500      RESTART
QUIT    blocked,caught  0xffffffff4004a500      RESTART
ILL     caught  0xffffffff4004a500      RESTART
TRAP    caught  0xffffffff4004a500      RESTART
ABRT    caught  0xffffffff4004a500      RESTART
EMT     default
FPE     caught  0xffffffff4004a500      RESTART
KILL    default
BUS     caught  0xffffffff4004a500      RESTART
SEGV    caught  0xffffffff4004a500      RESTART
SYS     default
PIPE    ignored
ALRM    default
TERM    caught  0xffffffff4004a500      RESTART
USR1    default
USR2    default
CLD     caught  0x100010ae0     RESTART,SIGINFO
PWR     default
WINCH   default
URG     default
POLL    default
STOP    default
TSTP    default
CONT    default
TTIN    default
TTOU    default
VTALRM  default
PROF    default
XCPU    default
XFSZ    default
WAITING default
LWP     default
FREEZE  default
THAW    default
CANCEL  default
LOST    default
XRES    default
JVM1    default
JVM2    default
RTMIN   default
RTMIN+1 default
RTMIN+2 default
RTMIN+3 default
RTMAX-3 default
RTMAX-2 default
RTMAX-1 default
RTMAX   default
10:04 tsstool@selilsx592[125]/proj/2gsim/usr/qzbiwis/ATE/hanging_process/selilsx592> pargs 2673 
2673:   /proj/lmdoste/tools/steroot/apps/sea/sea_R38A/lib/sea/bin/64bit/SEA -l INIT.sco
argv[0]: /proj/lmdoste/tools/steroot/apps/sea/sea_R38A/lib/sea/bin/64bit/SEA
argv[1]: -l
argv[2]: INIT.sco,AXEMANAGER.sco,SEAGUI.sco,RTS.sco,SCHR.sco
argv[3]: -p
argv[4]: 5001
argv[5]: -e
argv[6]: node-name
dr_
  • 29,602
Tulasi
  • 21

1 Answers1

1
/1:    flags = DSTOP
…
pstack: cannot examine 2673: process is traced

Stop debugging or dtracing the process. Then make sure that it is running with prun. Then try again.

JdeBP
  • 68,745
  • Being under a debugger or being traced isn't going to "save" a process from SIGKILL. – Andrew Henle Sep 26 '17 at 10:32
  • @AndrewHenle: From the /var/adm/messages only thing which I suspect is shown below. Sep 22 22:37:50 selilsx592 afs: [ID 702911 user.error] SENDINFO SunOS5.10-sparc;isit;latest;0/arc.cshrc;SunOS5.10-sparc; Sep 22 23:07:51 selilsx592 last message repeated 1 time Sep 22 23:08:01 selilsx592 automountd[555]: [ID 748625 daemon.error] selina006-v2.lmera.ericsson.se server not responding: RPC: Timed out Sep 23 00:37:51 selilsx592 afs: [ID 702911 user.error] SENDINFO SunOS5.10-sparc;isit;latest;0/arc.cshrc;SunOS5.10-sp – Tulasi Sep 27 '17 at 10:19
  • @Tulasi What are your NFS mount options? If you're mounting hard,nointr and the NFS server stops responding, I suspect you can wind up with an unkillable process hung in an NFS access. – Andrew Henle Sep 27 '17 at 11:10