Processing a single file as both input and output throughout pipes

Question

Good evening,

I would like to filter a file's content with some piped commands and then write the result back to the same file. I know, I can't do that the way I wrote it. Hold on …

This is the piece of bash script I have.

grep '^[a-zA-Z.:]' "$filepath" \
    | sed -r '/^(rm|cd)/d' \
    | uniq -u \
    > "$filepath"

So I thought I could succeed in, using process substitution instead. I then wrote:

grep '^[a-zA-Z.:]' < <(cat "$filepath") | …

This did not solve anything either. I expected process substitution to « save » my input file content somewhere, like in a temporary file. It seams I haven't understood process substitution either.

I read threads about "inplace" edition but these articles highlighted special options of some binaries like sed -i or sort -o but I need a general solution (I mean it has to suit any piped commands).

So first, why 'pipes standard way' cannot do this, what's happening underneath ? :/ And how should I solve my issue ? Could someone please explain me what is this all about ?

Thank you.

The "traditional" way to do this is to write to a temporary file and then mv the tmpfile over the original file. This works with any pipeline of commands rather than just the handful (like GNU sed -i, perl -i, sort -o, etc) that have support for in-place editing. write-to-tmpfile-and-rename is what those commands do internally, anyway. — cas, Feb 17 '16 at 01:14
What @cas said. Here's a great write-up on why trying to avoid temp files is a bad idea. Bottom line is: Use temp files. If you don't want to manage that yourself with mktemp (1), you might want to use sponge (1) from the moreutils package. — kba, Feb 17 '16 at 01:22
sed --in-place might also be something to look into, but look before you leap, if you will. — DopeGhoti, Feb 17 '16 at 01:45
@kba this article is totally what I was looking, for thank you ! I'm now working on using mktemp (I prefer doing things myself and avoid abstract work as much as possible) .. still it's not working for the moment but I must be mistaking with file descriptors. @.To whom it may interest, I will post my final script as it'll be finished and (apparently) consistent ^^. — Stphane, Feb 17 '16 at 09:29

Petr Skocik · Accepted Answer · 2016-02-17T17:08:18.290

As has been mentioned, sponge from moreutils is great. I use this script to emulate to avoid the moreutils dependecy:

#!/bin/sh -e
#Soak up input and tee it to arguments
st=0; tmpf=
tmpf="`mktemp`" && exec 3<>"$tmpf" || st="$?"
rm -f "$tmpf" #remove it even if exec failed; noop if mktemp failed
[ "$st" = 0 ] || exit "$st"
cat >&3
</dev/fd/3 tee "$@" >/dev/null

You can use it like so:

grep '^[a-zA-Z.:]' "$filepath" \
| sed -r '/^(rm|cd)/d' \
| uniq -u | sponge "$filepath"

You can't do this with simple output redirection because redirections take place before the commands are started and an output redirection truncates the output file.

In other words, by the time grep (the first simple command of the pipeline) starts, the last redirection has already truncated the input/output file.

There aren't really any standard UNIX utilities that do true in-place editing, as far as I know. sed -i only emulates it with a temporary file. I guess the reason is that true inplace filtering can easily corrupt the file if a pipeline step fails.

As far as what's going on underneath -- both | and <() use system pipes which take pass IO a buffer at a time. The mechanism doesn't create temporary files (not real (filesystem) files anyway) and it tries to avoid holding the whole input in memory at a time.

score 1 · Answer 2 · answered Feb 17 '16 at 15:44

If you want input from and output to the same file, you could try sponge. As its description states:

sponge reads standard input and writes it out to the specified file. 
Unlike a shell redirect, sponge soaks up all its input before writing 
the output file. This allows constructing pipelines that read from and 
write to the same file.

So you can have something like sed '...' file | grep '...' | sponge [-a] file taking input from file and outputting to the same file.

On the other hand, using temporary files is also a great way to work with the same file for input and output. You can initialize your temp files as follows:

tempfile=`mktemp tempFile.XXXX` # You can replace "tempFile" with any name you want

This creates a temporary file called "tempFile" in the directory where this script is run, with the extension "XXXX" where the x's are replaced with a combination of the current process number and random letters (for example, tempFile.AVm7).

Now you can modify your pipe (or any piped command) as follows:

grep '^[a-zA-Z.:]' "$filepath" \
    | sed -r '/^(rm|cd)/d' \
    | uniq -u \
    > "$tempfile"

After the filter, you can move your temp file to your original file as follows:

mv "$tempfile" "$filepath"

This eliminates your temp file and you remain with the filtered original file. But, sometimes, you may end up creating a lot of temp files that you may not need and haven't destroyed, so it is a good idea to clean your directory by deleting all temp files after your script ends if you no longer need them. You can write a routine for that as follows:

remove_temp_files() {
    rm `find . -name "tempFile.????"`
}

Then you can simply call on your routine remove_temp_files at the end of your script, eliminating any and all temporary files that were created in the format described above.

score 0 · Answer 3 · edited Apr 13 '17 at 12:36

0

Using Here-Document and Command Substitution is the standard way to go in this case:

grep '^[a-zA-Z.:]' <<IN \
    | sed -r '/^(rm|cd)/d' \
    | uniq -u \
    > "$filepath"
$(cat -- "$filepath")
IN

For other questions, they were explained in many questions before:

edited Apr 13 '17 at 12:36

Community

1

answered Feb 17 '16 at 16:06

cuonglm

153,898

You actually spotted a typo, I intended to give a try to «process subtution» but not to «command subtitution». Congratulation you deserve at least +1 for sharing an alternative ! – Stphane Feb 17 '16 at 16:32
It's worth noting that when trailing blank lines are semantically meaningful command substitution cannot work, because it strips them. – Barefoot IO Feb 17 '16 at 22:20
1

The here-doc in this case adds a newline after the command substitution. In the usual case where the file is a valid text file, the trailing newline stripped by command substitution is replaced. However, if the file does not end with a newline, the here-doc cannot faithfully reproduce it. Probably a seldom encountered corner case, but I mention it for completeness' sake. – Barefoot IO Feb 17 '16 at 22:29

Processing a single file as both input and output throughout pipes

3 Answers3

Linked