Automating pipes?

Question

I find myself using basically the same line over and over again:

cat file | command1 | command2 | command3 > file

Is there a way I can put all these pipes into one script, so I can just run

automatic.sh file

and accomplish the same thing?

with > file, even before any of the cat, command1, command2 or command3 is started, the shell will have truncated file. So cat will see an empty file. — Stéphane Chazelas, Dec 05 '13 at 20:30
@StephaneChazelas is right, you will destroy your data because file will be emptied! — Totor, Dec 06 '13 at 21:25

score 4 · Answer 1 · answered Dec 05 '13 at 20:29

Create a file with this content:

command1 | command2 | command3

Make it executable:

chmod +x that-file

And call it as:

/path/to/that-file < file.in > file.out

Add /path/to to your $PATH variable in order to be able to do:

that-file < file.in > file.out

score 3 · Answer 2 · edited Dec 06 '13 at 18:48

3

automatic.sh:

#!/bin/bash
cat $1 | command1 | command2 | command3 > .automatic.sh.temp
rm $1
mv .automatic.sh.temp $1

then call it like:

automatic.sh file

to use a real example:

arthur@a:~$ cat automatic.sh 
#!/bin/bash
cat $1 | grep foo | sed -e 's/foo/bar/g' | sort > $1

arthur@a:~$ cat <<END > foo
> 1 foo
> 2 bar
> 3 foo
> 4 bat
> 5 foo
> END
arthur@a:~$ chmod +x automatic.sh 
arthur@a:~$ ./automatic.sh foo
arthur@a:~$ cat foo
1 bar
3 bar
5 bar

And just to be pedantic, it's slightly better form to write the output to a temporary file, then at the end move the tempfile over the original file.

edited Dec 06 '13 at 18:48

Lucas Phillips

449
6
11

answered Dec 05 '13 at 19:35

Arthur Ulfeldt

1,492

yes, that was a bug. fixed :) – Arthur Ulfeldt Dec 05 '13 at 19:38
2

it produces empty file whatever the input. – jfs Dec 05 '13 at 20:39
1

does the example in the original question behave the same? They should both have the same race condition between the end of the pipe and the beginning, hence my comment about using a temporary file and a move at the end. – Arthur Ulfeldt Dec 05 '13 at 21:23
It's not just “slightly better form”. Depending on chance, the file may be truncated before it is read. cat file | … >file does not work reliably. – Gilles 'SO- stop being evil' Dec 05 '13 at 23:45
in your very top code block, I don't think you need that rm call. wouldn't the mv just overwrite $1? – strugee Dec 06 '13 at 18:58

score 1 · Answer 3 · edited Apr 13 '17 at 12:36

You can make a script or function that contains this command. Use "$1" to refer to the first argument passed to the script or function.

There's a major bug in your code snippet: depending on the timing, > file may truncate the file before the first command in the pipeline starts reading the file, or shortly after it starts reading. Your snippet may occasionally work with small files, but most of the time it won't work.

The recommended way to modify a file is to write to a new temporary file, and once this is finished, move it to replace the old version. This way, if something bad happens to interrupt the processing (such as an error, a power failure, etc.), the old file remains in place.

Here is a function that operates on this principle. Thanks to the && after the pipeline, it only moves the output file into place if command3 returns a success status (note that the return status of other commands in the pipeline is ignored). I rely on the common mktemp utility to create the temporary file (it ensures that the name of the temporary file won't collide with any other instance of the script or any other program).

my_pipeline () {
  out=$(TMPDIR=$(dirname -- "$1") mktemp)
  <"$1" command1 | command2 | command3 >"$out" &&
  mv -f "$out" "$1"
}

Put this function in your .bashrc; it will be available the next time you start bash. You can also copy-paste the definition on the command line to have it take effect in that shell. To use it, type the name of the function and then the name of the file to act on:

my_pipeline my_file

You can make the function act on all of its arguments in turn by putting that stuff in a loop.

my_pipeline () {
  for file; do
    out=$(TMPDIR=$(dirname -- "$file") mktemp)
    <"$file" command1 | command2 | command3 >"$out" &&
    mv -f "$out" "$file"
  done
}

Usage:

my_pipeline file1 file2 file3

If you want to make a script instead, put the code in a file starting with a shebang line to indicate that it's a shell script.

#!/bin/sh
for file; do
  out=$(TMPDIR=$(dirname -- "$file") mktemp)
  <"$file" command1 | command2 | command3 >"$out" &&
  mv -f "$out" "$file"
done

Put the file in your command search path and make it executable (see How can I make a program executable from everywhere).

Another way to solve the truncate-before-use problem is the sponge utility, but this utility isn't available everywhere (it's from Debian).

sponge is from http://kitenet.net/~joey/code/moreutils/, it's not debian specific. — Stéphane Chazelas, Dec 06 '13 at 20:38

Automating pipes?

3 Answers3