5

The /tmp/in, /tmp/out and /tmp/err are named pipes, already created and opened by some process (for reading, writing and writing, respectively).

I would like to create a new process that pipes its stdin into /tmp/in, and writes the contents of /tmp/out to its stdout, and the contents of /tmp/err to its stderr as they become available. Everything should work in a line buffered fashion. The process should exit when the other process, that created /tmp/in, stops reading and closes /tmp/in. The solution should work on Ubuntu, preferably without installing any extra package. I would like to solve it in a bash script.


mikeserv pointed out that without an SSCCE, it is hard to understand what I want. So, below is an SSCCE, but keep in mind that it is a minimal example so it is pretty silly.

The original setup

A parent process launches a child process and communicates with it through the child's stdin and stdout line-by-line. If I run it, I get:

$ python parent.py 
Parent writes to child:  a
Response from the child: A

Parent writes to child:  b
Response from the child: B

Parent writes to child:  c
Response from the child: C

Parent writes to child:  d
Response from the child: D

Parent writes to child:  e
Response from the child: E

Waiting for the child to terminate...
Done!
$ 

parent.py

from __future__ import print_function
from subprocess import Popen, PIPE
import os

child = Popen('./child.py', stdin=PIPE, stdout=PIPE)
child_stdin  = os.fdopen(os.dup(child.stdin.fileno()), 'w')
child_stdout = os.fdopen(os.dup(child.stdout.fileno()))

for letter in 'abcde':
    print('Parent writes to child: ', letter)
    child_stdin.write(letter+'\n')
    child_stdin.flush()
    response = child_stdout.readline()
    print('Response from the child:', response)
    assert response.rstrip() == letter.upper(), 'Wrong response'

child_stdin.write('quit\n')
child_stdin.flush()
print('Waiting for the child to terminate...')
child.wait()
print('Done!')

child.py, must be executable!

#!/usr/bin/env python
from __future__ import print_function
from sys import stdin, stdout

while True:
    line = stdin.readline()
    if line == 'quit\n':
        quit()
    stdout.write(line.upper())
    stdout.flush()

The desired setup and a hackish solution

Neither the parent's source file nor the child's source file can be edited; it is not allowed.

I rename the child.py to child_original.py (and make it executable). Then, I put a bash script (a proxy or a middle man if you wish) called child.py, start the child_original.py myself before running python parent.py and have the parent.py call the fake child.py which is now my bash script, forwarding between the parent.py and the child_original.py.

The fake child.py

#!/bin/bash
parent=$$
cat std_out &
(head -n 1 shutdown; kill -9 $parent) &
cat >>std_in

The start_child.sh to start child_original.py before executing the parent:

#!/bin/bash
rm -f  std_in std_out shutdown
mkfifo std_in std_out shutdown
./child_original.py <std_in >std_out
echo >shutdown
sleep 1s
rm -f  std_in std_out shutdown

The way of executing them:

$ ./start_child.sh & 
[1] 7503
$ python parent.py 
Parent writes to child:  a
Response from the child: A

Parent writes to child:  b
Response from the child: B

Parent writes to child:  c
Response from the child: C

Parent writes to child:  d
Response from the child: D

Parent writes to child:  e
Response from the child: E

Waiting for the child to terminate...
Done!
$ echo 

[1]+  Done                    ./start_child.sh
$ 

This hackish solution works. As far as I know, it does not meet the line buffered requirement and there is an extra shutdown fifo to inform the start_child.sh that child_original.py has closed the pipes and start_child.sh can safely exit.


The question asks for an improved fake child.py bash script, meeting the requirements (line buffered, exits when the child_original.py closes any of the pipes, does not require an extra shutdown pipe).



Stuff I wish I had known:

  • If a high-level API is used for opening a fifo as a file, it must be opened for both reading and writing, otherwise the call to open already blocks. This is incredibly counter-intuitive. See also Why does a read-only open of a named pipe block?
  • In reality, my parent process is a Java application. If you work with an external process from Java, read the stdout and stderr of the external process from daemon threads (call setDamon(true) on those threads before starting them). Otherwise, the JVM will hang forever, even if everybody is done. Although unrelated to the question, other pitfalls include: Navigate yourself around pitfalls related to the Runtime.exec() method.
  • Apparently, unbuffered means buffered, but we don't wait until the buffer gets full but flush it as soon as we can.
Ali
  • 5,341
  • Did you try doing the forwarding in bash while running bash with stdbuf? E.g.,forward() { while read line; do echo "$line"; done; } forward </tmp/out & forward </tmp/err >&2 & forward >/tmp/in – Petr Skocik Jul 12 '15 at 13:26
  • 1
    Nice question and researches. looking forward to the answers. – Olivier Dulac Jul 14 '15 at 23:52
  • @PSkocik I have dropped an e-mail to your e-mail address. Please let me know if it hasn't arrived yet. – Ali Oct 12 '15 at 18:13

3 Answers3

3

If you get rid of the killing and shutdown stuff (which is unsafe and you may, in an extreme, but not unfathomable case when child.py dies before the (head -n 1 shutdown; kill -9 $parent) & subshell does end up kill -9ing some innocent process), then child.py won't be terminating because your parent.py isn't behaving like a good UNIX citizen.

The cat std_out & subprocess will have finished by the time you send the quit message, because the writer to std_out is child_original.py, which finishes upon receiving quit at which moment it closes its stdout, which is the std_out pipe and that close will make the cat subprocess finish.

The cat > std_in isn't finishing because it's reading from a pipe originating in the parent.py process and the parent.py process didn't bother to close that pipe. If it did, cat > stdin_in and consequently the whole child.py would finish by itself and you wouldn't need the shutdown pipe or the killing part (killing a process that isn't your child on UNIX is always a potential security hole if a race condition caused due to rapid PID recycling should occur).

Processes at the right end of a pipeline generally only finish once they're done reading their stdin, but since you're not closing that (child.stdin), you're implicitly telling the child process "wait, I have more input for you" and then you go kill it because it does wait for more input from you as it should.

In short, make parent.py behave reasonably:

from __future__ import print_function
from subprocess import Popen, PIPE
import os

child = Popen('./child.py', stdin=PIPE, stdout=PIPE)

for letter in 'abcde':
    print('Parent writes to child: ', letter)
    child.stdin.write(letter+'\n')
    child.stdin.flush()
    response = child.stdout.readline()
    print('Response from the child:', response)
    assert response.rstrip() == letter.upper(), 'Wrong response'

child.stdin.write('quit\n')
child.stdin.flush()
child.stdin.close()
print('Waiting for the child to terminate...')
child.wait()
print('Done!')

And your child.py can be as simple as

#!/bin/sh
cat std_out &
cat > std_in
wait #basically to assert that cat std_out has finished at this point

(Note that I got rid of that fd dup calls because otherwise you'd need to close both child.stdin and the child_stdin duplicate).

Since parent.py operates in line-oriented fashion, gnu cat is unbuffered (as mikeserv pointed out) and child_original.py operates in a line oriented fashion, you've effectively got the whole thing line-buffered.


Note on Cat: Unbufferred might not be the luckiest term, as gnu cat does use a buffer. What it doesn't do is try to get the whole buffer full before writing things out (unlike stdio). Basically it makes read requests to the os for a specific size (its buffer size), and writes whatever it receives without waiting to get a whole line or the whole buffer. (read(2) can be lazy and give you only what it can give you at the moment rather than the whole buffer you've asked for.)

(You can inspect the source code at http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/cat.c ; safe_read (used instead of plain read) is in the gnulib submodule and it's a very simple wrapper around read(2) that abstracts away EINTR (see the man page)).

Petr Skocik
  • 28,816
  • @Ali : the last chapter means that, as you have 2 line buffered programs (and especially the last one is!), the resulting output of the 3 will be line oriented as well. If your last program is buffered (add a | grep ^) then the resulting output to your terminal or file would be block buffered. – Olivier Dulac Jul 14 '15 at 23:50
2

With a sed the input will always be read in on a line-buffer, and the output can be explicitly flushed per line with the w command. For example:

(       cd /tmp; c=
        mkfifo i o
        dd  bs=1    <o&
        sed -n w\ o <i&
        while   sleep 1
        do      [ -z "$c" ] && rm [io]
                [ "$c" = 5 ]   && exit
                date "+%S:%t$((c+=1))"
        done|   tee i
)

44: 1
44: 1
45: 2
45: 2
46: 3
46: 3
47: 4
47: 4
48: 5
48: 5
30+0 records in
30+0 records out
30 bytes (30 B) copied, 6.15077 s, 0.0 kB/s

...where tee (which is spec'd not to block buffer) writes its output to the terminal and to sed's i pipe simultaneously. sed reads i line-by-line and writes each line it reads to its out pipe as soon it does. dd reads the out pipe a byte at a time, and it shares a stdout with tee, and so they both write the output to the terminal at the same time. This would not happen if sed didn't explicitly line-buffer. Here's the same run but without the write command:

(       cd /tmp; c=
        mkfifo i o
        dd  bs=1    <o&
        sed ''  >o  <i&
        while   sleep 1
        do      [ -z "$c" ] && rm [io]
                [ "$c" = 5 ]   && exit
                date "+%S:%t$((c+=1))"
        done|   tee i
)

48: 1
49: 2
50: 3
51: 4
52: 5
48: 1
49: 2
50: 3
51: 4
52: 5
30+0 records in
30+0 records out
30 bytes (30 B) copied, 6.15348 s, 0.0 kB/s

In that case sed block buffers, and doesn't write a thing to dd until its input closes, at which point it flushes its output and quits. In fact, it exits when its writer does in both cases, as is witnessed in dd's diagnostics being written at pipeline's end.

Still though...

(       cd /tmp; c=
        mkfifo i o
        dd  bs=1    <o&
        cat >o      <i&
        while   sleep 1
        do      [ -z "$c" ] && rm [io]
                [ "$c" = 5 ]   && exit
                date "+%S:%t$((c+=1))"
        done|   tee i
)

40: 1
40: 1
41: 2
41: 2
42: 3
42: 3
43: 4
43: 4
44: 5
44: 5
30+0 records in
30+0 records out
30 bytes (30 B) copied, 6.14734 s, 0.0 kB/s

Now my cat is a GNU version - and GNU's cat (if called without options) never block buffers. If you are also using a GNU cat then it seems pretty clear to me that the problem is not with your relay, but with your Java program. However, if you are not using a GNU cat, then there is a chance that it will buffer output. Lucky for you, though, there is but only one POSIX spec'd option to cat, and that is for -unbuffered output. You might try it.

I'm looking at your thing, and after playing around with it awhile I'm pretty sure your problem is a deadlock situation. You've got the one cat hanging on input there at the end, and if the JVM process is also waiting for someone to talk to it, then probably nothing will ever happen. So I wrote this:

#!/bin/sh
die()   for io  in  i o e
        do      rm "$io"
                kill -9 "$(($io))"
        done    2>/dev/null
io()    while   eval "exec $((fd+=1))>&-"
        do      [ "$fd" = 9 ] &&
                { cat; kill -1 0; }
        done
cd /tmp; fd=1
mkfifo   i o e
{   io <o >&4 & o=$!
    io <e >&5 & e=$!
    io >i <&3 & i=$!
}   3<&0  4>&1  5>&2
trap "die; exit 0" 1
echo; wait

It's kind of sloppy about handling return codes, unfortunately. With more work it could be made to do so reliably though. Anyway, as you can see it backgrounds all of the cats, writes an empty line to stdout, then waits until one of the cats quits, which should set off a chain that kills all of them in all cases, I think.

mikeserv
  • 58,310
  • Thanks. It seems rather convoluted. :( Now, if I run the Java application - external process combo without my proxy bash script in the middle, "the buffers are properly flushed on both sides, they never hang" as I write in my question. It is unclear to me why they hang if I put my bash script in the middle. I have GNU cat; I have double-checked (Ubuntu 14.04 LTS). – Ali Jul 12 '15 at 16:06
  • @Ali - Well, w/ ubuntu, you definitely have a GNU cat. And yes, it is rather convoluted. I wanted only to show reproducibly and unmistakably that date | tee >i (cat|sed) >o dd >tty would come out as soon as it went in. There are many portable options for avoiding block buffers in a pipeline, and (tee|dd|cat|sed) are the best of these. I don't know why your issue arises, either. It seems strange to me as well, but you don't make it very clear exactly (where|when|how|why) io comes and goes. Java VM and external process don't help much, I'm afraid. What external process? – mikeserv Jul 12 '15 at 16:12
  • I would like to test your solution but I would like to ask you to please put your answer as a standalone shell script on the top of your answer. Does it satisfy the line buffered requirement of the question? Since dd does not read line-by-line, I am afraid it doesn't. – Ali Jul 12 '15 at 19:09
  • As for my failed attempt with my silly shell script: I could try to make a minimal example and put all the source code, e.g., on GitHub. Are you willing to debug this issue? If you are, then I probably do it, otherwise I won't waste my time chasing this wild goose. In any case, the problem only appears if I run it with that silly shell script so the bug is most likely there. – Ali Jul 12 '15 at 19:13
  • @Ali. I think so. I've been playing with some stuff. Please see the edit I just posted. – mikeserv Jul 12 '15 at 20:38
  • Unfortunately, your most recent edit is way beyond my level of understanding of shell scripts, sorry. :( Please answer my previous two questions that I asked in comments. – Ali Jul 12 '15 at 20:46
  • @Ali - Well, neither cat nor dd would do line-buffered output - they'll both write out the sizes of the blocks they read in. I set dd's blocksize to a single byte - which is slow, but sure. sed definitely will do line-buffered output because it always buffers input by line - if that's what you want. The bit of shell I wrote before was just to demonstrate how such tools might work in a chain - I had to provide my own input source because there's no example i/o. The sh script might work as a stand-in replacement for yours, except that it tries to create/clean-up its own pipe files. – mikeserv Jul 12 '15 at 21:04
  • Great, thanks, I appreciate your efforts. Unfortunately, it is not the answer that I was hoping for... :( Well, I guess my problem at hand is more difficult than it seems at first. I leave the question open for a while. – Ali Jul 12 '15 at 21:58
  • 1
    @Ali - of course you should. But I don't think you're likely to get much of an answer if you don't bother to say what you're doing. You should try to do that - and not for my sake. If you don't tell people what you do to cause a problem, how can you expect them to understand how to fix it? – mikeserv Jul 12 '15 at 22:14
  • Well, I can make the Java code public on GitHub. I let you know when I have those example codes ready. – Ali Jul 12 '15 at 22:19
  • 1
    I have added an SSCCE. It is unclear why my original attempt with that silly bash script failed but we will never know: My /tmp folder lives in the RAM and its contents are lost when I reboot my machine. In any case, please check the SSCCE and update your answer accordingly. I hope that the question is clear now. Thanks! – Ali Jul 13 '15 at 15:51
  • @Ali. Wow. That was a very good edit, i'm impressed. I think it is, though it's still a little hazy about why. Probably that doesn't matter - and that's your business - but occasionally motive can be telling clue in a solution. Doesn't matter - alphabets are fine - better than fine, it was an excellent edit. – mikeserv Jul 13 '15 at 19:00
  • I am glad you like the updated answer. There are a bunch of use cases, that is why I am vague about the motivation. Probably the most important use case is debugging: I want to run the child process in an IDE. I have no idea how to attach a debugger to a process that was not started by the IDE. The shell script in the question is sufficient for this but I am sure you can do better than that. The next thing on the agenda is to intercept the communication between two C++ applications without hurting the performance two much. – Ali Jul 13 '15 at 19:25
  • @Ali - oh that's good! But you should post it as an answer if it is an answer and accept it. – mikeserv Jul 13 '15 at 19:29
  • Unfortunately, no. Please re-read the updated question: The hackish script in the question provides a temporary workaround but it does not answer the original question and it is a compromise (uses an extra fifo). As for my previous comment: Although it is not a use case but I wanted to learn more about pipes and find a proper solution, not just some hack that works for the time being. Anyway, please update your answer given the new information. – Ali Jul 13 '15 at 19:34
  • @Ali Oh, ok. It's 'just, in the last comment you called it an answer... Anyway, I'll try it up. No promises - I have developed a rather strong distaste for Python. – mikeserv Jul 13 '15 at 19:39
  • Ooops, sorry, that was a typo... :( I meant to write updated question. Let me know if you have difficulties understanding the python example. I think if you run it first, you will better understand what's happening. – Ali Jul 13 '15 at 19:42
  • FYI: I have started a bounty on this question. – Ali Jul 14 '15 at 12:31
  • @Ali - is there anything useful to you here? – mikeserv Jul 14 '15 at 14:44
  • PSkocik's second answer is almost an answer. However, I did some benchmarking, and interesting things are happening: Line buffering makes matters worse! For example the line buffered sed -n w\ o <i& in your answer performs worse than the corresponding unbuffered cat command. Please check the updated question! – Ali Jul 14 '15 at 23:35
  • 2
    @Ali: line buffering instead of whole block(s) buffering is probably less efficient, this is normal. Just the "look for a newline" is less efficient than just copy n bytes. and the block-buffer sizes are custom-sized to be the most efficient in the most cases. – Olivier Dulac Jul 14 '15 at 23:57
  • @Ali - I know. It's why i didn't stop at sed - cat is better. But if you implemented the solution in kernel space, in a pty, line-buffering would be automatic and efficient. Doing this w/ pipes is possible, but difficult, and prone to error. It's why there is such a thing as a pty at all. It's how screen works. It's how your thing should, too. Please look again at the link in my last comment. – mikeserv Jul 15 '15 at 00:18
  • 2
    @OlivierDulac - amen, but just so we don't confuse the already 17-deep comment thread, cat copies its in-block to an out-block as soon as it is received, whereas standard C-lib block-buffering is to withhold output until it equals at least some size of a block at least - and it's typically 4k or so. – mikeserv Jul 15 '15 at 00:21
  • @mikeserv Sorry, I could get back to this question earlier, I have been busy with work. Yes, I read your other linked answer but I did not understand it, and I did not see how it related to my question. You vastly overestimate my understanding of Linux. :( Anyway, the cat solution is fast enough, I can pump 3GB of data through it in 0.8 seconds. That's more than enough; there is no need for more sophisticated solutions. Anyway, +1, and thanks for your help and patience! – Ali Jul 20 '15 at 13:12
0

In bash, you could try:

forward() { while read line; do echo "$line"; done; } 
forward </tmp/out & 
forward </tmp/err >&2 &
forward >/tmp/in
wait

and then run the script with stdbuf -i0 -oL.

The forward function is basically your the pipe method from your python code with src and dest defaulting to stdin and stdout and without the explicit flushing, hoping that stdbuf might do it.

If you're already concerned about performance and want it in C code, put it in C code. I'm not familiar with a stdbuf friendly cat alternative but here's a C++ one liner (almost) for catting stdin to stdout:

#include <iostream>
using namespace std;
int main() { for(string line; getline(cin,line); ){ cout<<line<<'\n'; }; }

Or if you absolutely must have C code:

#include <stdlib.h>
#include <stdio.h>
int main()
{
  const size_t N = 80;
  char *lineptr = (char*)malloc(N); //this will get realloced if a longer line is encountered
  size_t length = N;
  while(getline(&lineptr, &length, stdin) != -1){
    fputs(lineptr, stdout);
  }
  free(lineptr);
  return 0;
}

Neither the C nor the C++ example do explicit flushing after each line, because I think leaving that to stdbuf is a better design decision, but you could always add that by calling fflush(stdout) in the C example after each line, replacing '\n' with endl in the C++ example, or more efficiently, by pre-setting the buffering to line buffering in both cases so you don't have to make those "expensive" C/C++ function calls.

Petr Skocik
  • 28,816