4

I have a shell-wrapper around a large executable. It does something like this:

run/the/real/executable "$@" &
PID=$!
# perform
# a few
# minor things
wait $PID
# perform some
# post-processing

One of the things it does after the wait is check for core-dumps and handle process the crashes, however, by then the process is already dead and some information no longer available.

Can the fatal signal (SIGSEGV or SIGBUS) be intercepted by the shell script before it is delivered to the child itself?

I'd then be able to, for example, perform lsof -p $PID to get the list of files opened by the wrapped process before it dies...

Update: I tried using strace to catch the process receiving a signal. Unfortunately, there seems to be a race -- when strace reports the child's signal, the child is on its way out and there is no telling, whether the lsof will get the list of its files or not...

Here is the test script, which spawns off /bin/sleep and tries to get the files it has opened for writing. Some times the /tmp/sleep-output.txt is reported as it should be, other times the list is empty...

ulimit -c 0
/bin/sleep 15 > /tmp/sleep-output.txt &

NPID=$!

echo "Me: $$, sleep: $NPID"

(sleep 3; kill -BUS $NPID) &

ps -ww $NPID
while read line
do
        set -x
        outputfiles=$(lsof -F an -b -w -p $NPID | sed -n '/^aw$/ {n; s,.,,; p}')
        ps -ww $NPID
        lsof -F an -b -w -p $NPID
        break
done < <(strace -qq -p $NPID -e trace=signal 2>&1)
echo $outputfiles

wait $NPID

The above test requires use of ksh or bash (for the < <(...) construct to work).

  • 1
    The shell can't do this, you need to write a program that uses ptrace. – Barmar Jun 25 '18 at 21:03
  • Shell can spawn off strace... Or some other equivalent of attaching debugger. But doing that will prevent usage of lsof -- which seems to also use ptrace-interface. – Mikhail T. Jun 25 '18 at 21:56
  • 1
    strace will just report that the process received the signal, it won't stop it so you can examine the process at that moment. – Barmar Jun 25 '18 at 22:06
  • If I run strace -qq -e trace=signal -p $PID | read line, my script can wait until the process gets a signal. Trouble is, when that happens, it is already too late. With whatever I code using ptrace, it may be the same problem -- can't run lsof, while ptrace is in effect, but, as soon as the ptrace is lifted, the child will exit... – Mikhail T. Jun 25 '18 at 22:19
  • 1
    When you use ptrace, the process stops when the tracing process is notified of a signal. The tracing process has to tell it to resume. strace does that after it prints the message, but if you write your own program you can keep it stopped until after you run lsof. – Barmar Jun 25 '18 at 23:33
  • I had the impression, lsof tries to ptrace the process too, which would fail if my program is ptrace-ing it already... – Mikhail T. Jun 26 '18 at 03:37
  • 1
    I can't think of any reason why lsof would need to use ptrace. I think it just looks in /proc – Barmar Jun 26 '18 at 15:48
  • You are right, lsof does not use ptrace. Unfortunately, it is still not a reliable method -- see my update to the question. How would my hypothetical program use ptrace(2) differently from how strace(1) uses it? – Mikhail T. Jun 27 '18 at 14:38
  • 1
    When a process being traced receives a signal, it stops. The tracing process uses wait() or waitpid() to wait for this to happen. It can then run lsof, or do the equivalent by listing /proc/<pid>/fd. Once it's done it calls ptrace() with the PTRACE_CONT request to let the process continue. – Barmar Jun 27 '18 at 15:33
  • Ok, so you think, strace lets the process continue (to its death), whereas my custom program will do the file-listing first thus avoiding the race... Ok, I'll try. Do you expect the traced process to run any slower because of the tracing, though? – Mikhail T. Jun 27 '18 at 19:25
  • 1
    Yes, strace simply prints the reason the process stopped and then tells it to continue immediately. Your program can do something else. I'm not sure of the exact impact on the process being trace, I'd expect it only to be slowed down when something happens that stops it, since the tracer has to tell it to resume. – Barmar Jun 27 '18 at 19:27
  • Thanks. Why don't you shape your comments into an answer, which I'll be able to "accept"? – Mikhail T. Jun 27 '18 at 19:30

1 Answers1

2

As far as I know, there are no shell methods to do what you're trying, it will have to be done from a custom program.

Use ptrace() to monitor the process, similar to the way a debugger does. When the process receives a signal, it will be stopped, and the monitoring program will be notified (its call to wait() will return, and WIFSTOPPED(status) will be true).

It can then run lsof -p <pid> to list the open files of the process, and then call ptrace(PTRACE_CONT, pid, NULL, 0) to restart the process.

Barmar
  • 9,927