I have a shell-wrapper around a large executable. It does something like this:
run/the/real/executable "$@" &
PID=$!
# perform
# a few
# minor things
wait $PID
# perform some
# post-processing
One of the things it does after the wait
is check for core-dumps and handle process the crashes, however, by then the process is already dead and some information no longer available.
Can the fatal signal (SIGSEGV
or SIGBUS
) be intercepted by the shell script before it is delivered to the child itself?
I'd then be able to, for example, perform lsof -p $PID
to get the list of files opened by the wrapped process before it dies...
Update: I tried using strace
to catch the process receiving a signal. Unfortunately, there seems to be a race -- when strace
reports the child's signal, the child is on its way out and there is no telling, whether the lsof
will get the list of its files or not...
Here is the test script, which spawns off /bin/sleep
and tries to get the files it has opened for writing. Some times the /tmp/sleep-output.txt
is reported as it should be, other times the list is empty...
ulimit -c 0
/bin/sleep 15 > /tmp/sleep-output.txt &
NPID=$!
echo "Me: $$, sleep: $NPID"
(sleep 3; kill -BUS $NPID) &
ps -ww $NPID
while read line
do
set -x
outputfiles=$(lsof -F an -b -w -p $NPID | sed -n '/^aw$/ {n; s,.,,; p}')
ps -ww $NPID
lsof -F an -b -w -p $NPID
break
done < <(strace -qq -p $NPID -e trace=signal 2>&1)
echo $outputfiles
wait $NPID
The above test requires use of ksh
or bash
(for the < <(...)
construct to work).
ptrace
. – Barmar Jun 25 '18 at 21:03strace
... Or some other equivalent of attaching debugger. But doing that will prevent usage oflsof
-- which seems to also use ptrace-interface. – Mikhail T. Jun 25 '18 at 21:56strace
will just report that the process received the signal, it won't stop it so you can examine the process at that moment. – Barmar Jun 25 '18 at 22:06strace -qq -e trace=signal -p $PID | read line
, my script can wait until the process gets a signal. Trouble is, when that happens, it is already too late. With whatever I code usingptrace
, it may be the same problem -- can't runlsof
, while ptrace is in effect, but, as soon as theptrace
is lifted, the child will exit... – Mikhail T. Jun 25 '18 at 22:19ptrace
, the process stops when the tracing process is notified of a signal. The tracing process has to tell it to resume.strace
does that after it prints the message, but if you write your own program you can keep it stopped until after you runlsof
. – Barmar Jun 25 '18 at 23:33lsof
tries to ptrace the process too, which would fail if my program is ptrace-ing it already... – Mikhail T. Jun 26 '18 at 03:37lsof
would need to useptrace
. I think it just looks in/proc
– Barmar Jun 26 '18 at 15:48lsof
does not useptrace
. Unfortunately, it is still not a reliable method -- see my update to the question. How would my hypothetical program use ptrace(2) differently from how strace(1) uses it? – Mikhail T. Jun 27 '18 at 14:38wait()
orwaitpid()
to wait for this to happen. It can then runlsof
, or do the equivalent by listing/proc/<pid>/fd
. Once it's done it callsptrace()
with thePTRACE_CONT
request to let the process continue. – Barmar Jun 27 '18 at 15:33strace
lets the process continue (to its death), whereas my custom program will do the file-listing first thus avoiding the race... Ok, I'll try. Do you expect the traced process to run any slower because of the tracing, though? – Mikhail T. Jun 27 '18 at 19:25strace
simply prints the reason the process stopped and then tells it to continue immediately. Your program can do something else. I'm not sure of the exact impact on the process being trace, I'd expect it only to be slowed down when something happens that stops it, since the tracer has to tell it to resume. – Barmar Jun 27 '18 at 19:27