0

I am tending to a program "master" which manages a set of concurrently running sub-processes "slaves". Sub-processes are launched and killed as needed. Many of these sub-processes use start-scripts.

Output of pstree looks like this (excerpt, the master is implemented in Java, two slaves launched via script):

systemd───java─┬─sh───slave
               ├─slave
               └─sh───slave

Previously, the start-scripts redirected the slave's outputs to log files. It was decided that the master should handle the slave's outputs as well. The master's implementation was extended by adding a buffered reader like this:

process =  Runtime.getRuntime().exec(cmd);
BufferedReader br = new BufferedReader(new InputStreamReader(process.getInputStream()));
while (null != (line = br.readLine())) {
    // handle slave output here
}

The system then developed serious issues with slaves who had been killed (sent SIGTERM) by the master but in fact were still running. I noticed this happened only with slaves which met two criteria:

  • they made use of a start-script
  • they rarely wrote to standard output

Since the master had not killed the slave, but only its immediate parent (the shell interpreter), the slave was now owned by init. In my case, systemd seems to be the default reaper. pstree lookes like this:

systemd─┬─java───sh───slave
        └─slave

Functionally, I solved this problem by explicitly killing the slave's entire family. Yet I still wonder:

Why does systemd kill the orphaned child only if it writes to standard output (or error) and only if standard output was previously read by another process?

The question is rather lengthy as it is. Upon request, I can supply a minimal code example to reproduce the behaviour described.

Hermann
  • 6,148

1 Answers1

3

That's likely not systemd doing it.

Instead, the process is killed by a SIGPIPE when it tries to write to a pipe where the read side has been closed -- which fits the description "standard output was previously read by another process."

  • A sensible explanation. I was distracted because readLine() never returns. The input stream is reported to be open even after sh had been killed. I assumed the parent process would inherit the streams, but apparently this is not the case in this scenario. It seems like the stream's state is only updated if the (disconnected) sub-process tries to write to it. Might be a Java thing. – Hermann Nov 30 '21 at 11:06
  • There is no way to update the stream state, because there is no notification mechanism for that. The SIGPIPE is the first time the process learns that no one is interested in the output, and by default, this signal terminates the process. This is meant for shell commands that operate in pipelines -- if further output will be ignored, there is no point in continuing to run the program. – Simon Richter Nov 30 '21 at 11:11
  • There can be other explanation: Your start-script does not propagate SIGTERM to the "slave" process it itself started, which results in SIGTERM killing only the shell script. Process manager then tries to do its job and hooks up the orphaned "slave" under itself. Try adding a SIGTERM trap to the start-script so it sends SIGTERM to all its children. – Fiisch Nov 30 '21 at 11:16