I have a process which appears to be hung.
When I try and reboot the process, I get a timeout.
service logstash_server stop
timeout: run: logstash_server: (pid 11797) 839061s, want down, got TERM
I have tried running a tail -f
on the logs, which unfortunately do not show anything.
I've also tried a kill -15
on the process, but it is still hung.
Top does not show this as a zombie process.
I'd like to figure out 'why' this process is in this state since it is the 3rd time this has happened in the past month.
I checked the file descriptors and the syslog, but don't see anything noticeable.
file descriptors => http://pastebin.com/90rDHhT4
syslog output => http://pastebin.com/xBaMaL9Z
output of lsof | grep logstash
=> http://pastebin.com/gsSdPyg5
I tried running an strace on the process, and it just shows FUTEX_WAIT
strace -p 11797
Process 11797 attached
futex(0x7f6d95d8e9d0, FUTEX_WAIT, 11811, NULL
Is there anything else I can do before I issue a kill -9
?
Update
Opened ticket with developers. Issue continues about once per week.
strace
to see what the process is doing. However, as I remember this is java-embedded-ruby application which makes it huge enough to better leavestrace
alone.. You can checkls -l /proc/11797/fd/
to see what file descriptors it has opened. Maybe there you'll find something interesting. – pawel7318 Apr 13 '15 at 17:22kill -3 <PID>
will write out the thread dumps, which can possibly help. – rahul Apr 13 '15 at 17:23