I have a shell-wrapper around a large executable. It does something like this:
run/the/real/executable "$@" &
PID=$!
# perform
# a few
# minor things
wait $PID
# perform some
# post-processing
One of the things it does after the wait is check for core-dumps and handle process the crashes, however, by then the process is already dead and some information no longer available.
Can the fatal signal (SIGSEGV or SIGBUS) be intercepted by the shell script before it is delivered to the child itself?
I'd then be able to, for example, perform lsof -p $PID to get the list of files opened by the wrapped process before it dies...
Update: I tried using strace to catch the process receiving a signal. Unfortunately, there seems to be a race -- when strace reports the child's signal, the child is on its way out and there is no telling, whether the lsof will get the list of its files or not...
Here is the test script, which spawns off /bin/sleep and tries to get the files it has opened for writing. Some times the /tmp/sleep-output.txt is reported as it should be, other times the list is empty...
ulimit -c 0
/bin/sleep 15 > /tmp/sleep-output.txt &
NPID=$!
echo "Me: $$, sleep: $NPID"
(sleep 3; kill -BUS $NPID) &
ps -ww $NPID
while read line
do
set -x
outputfiles=$(lsof -F an -b -w -p $NPID | sed -n '/^aw$/ {n; s,.,,; p}')
ps -ww $NPID
lsof -F an -b -w -p $NPID
break
done < <(strace -qq -p $NPID -e trace=signal 2>&1)
echo $outputfiles
wait $NPID
The above test requires use of ksh or bash (for the < <(...) construct to work).
ptrace. – Barmar Jun 25 '18 at 21:03strace... Or some other equivalent of attaching debugger. But doing that will prevent usage oflsof-- which seems to also use ptrace-interface. – Mikhail T. Jun 25 '18 at 21:56stracewill just report that the process received the signal, it won't stop it so you can examine the process at that moment. – Barmar Jun 25 '18 at 22:06strace -qq -e trace=signal -p $PID | read line, my script can wait until the process gets a signal. Trouble is, when that happens, it is already too late. With whatever I code usingptrace, it may be the same problem -- can't runlsof, while ptrace is in effect, but, as soon as theptraceis lifted, the child will exit... – Mikhail T. Jun 25 '18 at 22:19ptrace, the process stops when the tracing process is notified of a signal. The tracing process has to tell it to resume.stracedoes that after it prints the message, but if you write your own program you can keep it stopped until after you runlsof. – Barmar Jun 25 '18 at 23:33lsoftries to ptrace the process too, which would fail if my program is ptrace-ing it already... – Mikhail T. Jun 26 '18 at 03:37lsofwould need to useptrace. I think it just looks in/proc– Barmar Jun 26 '18 at 15:48lsofdoes not useptrace. Unfortunately, it is still not a reliable method -- see my update to the question. How would my hypothetical program use ptrace(2) differently from how strace(1) uses it? – Mikhail T. Jun 27 '18 at 14:38wait()orwaitpid()to wait for this to happen. It can then runlsof, or do the equivalent by listing/proc/<pid>/fd. Once it's done it callsptrace()with thePTRACE_CONTrequest to let the process continue. – Barmar Jun 27 '18 at 15:33stracelets the process continue (to its death), whereas my custom program will do the file-listing first thus avoiding the race... Ok, I'll try. Do you expect the traced process to run any slower because of the tracing, though? – Mikhail T. Jun 27 '18 at 19:25stracesimply prints the reason the process stopped and then tells it to continue immediately. Your program can do something else. I'm not sure of the exact impact on the process being trace, I'd expect it only to be slowed down when something happens that stops it, since the tracer has to tell it to resume. – Barmar Jun 27 '18 at 19:27