Is there a standard alternative to sponge to pipe a file into itself?

Question

I frequently want to do something like this:

 cat file | command > file

(which obviously doesn't work). The only solution I've seen for this is sponge, i.e.

 cat file | command | sponge file

Unfortunately, sponge is not available to me (nor can I install it or any other package).

Is there a more standard quick way to do this without having to break it up every time into multiple commands (pipe to temp file, pipe back to original, delete temp file)? I tried tee for example, and it seems to work, but is it a consistent/safe solution?

If there were a more standard way, sponge wouldn't have had to exist. If you can't install a package, can you create the corresponding shell script to use? (x=$(mktemp); cat > "$x" ; mv "$x" "$1") — Michael Homer, Sep 24 '19 at 23:09
sponge is itself just creating a temporary file and moving it onto the original (and breaking hardlinks), so better be honest about it (like in @MichaelHomer's example), rather than hide the ugly truth under the rug, and pretend that it just does something magical ;-) An improvement imho to that would be (x=$(mktemp); cat >"$x" && mv "$x" "$1" || rm "$x"). — , Feb 05 '20 at 08:41

Kusalananda · Accepted Answer · 2020-02-05T10:26:58.227

A shell function replacing sponge:

mysponge () (
    append=false

    while getopts 'a' opt; do
        case $opt in
            a) append=true ;;
            *) echo error; exit 1
        esac
    done
    shift "$(( OPTIND - 1 ))"

    outfile=$1

    tmpfile=$(mktemp "$(dirname "$outfile")/tmp-sponge.XXXXXXXX") &&
    cat >"$tmpfile" &&
    if "$append"; then
        cat "$tmpfile" >>"$outfile"
    else
        if [ -f "$outfile" ]; then
            chmod --reference="$outfile" "$tmpfile"
        fi
        if [ -f "$outfile" ]; then
            mv "$tmpfile" "$outfile"
        elif [ -n "$outfile" ] && [ ! -e "$outfile" ]; then
            cat "$tmpfile" >"$outfile"
        else
            cat "$tmpfile"
        fi
    fi &&
    rm -f "$tmpfile"
)

This mysponge shell function passes all data available on standard input on to a temporary file.

When all data has been redirected to the temporary file, the collected data is copied to the file named by the function's argument. If data is not to be appended to the file (i.e -a is not used), and if the given output filename refers to an existing regular file, if it does not exist, then this is done with mv (in the case that the file is an existing regular file, an attempt is made to transfer the file modes to the temporary file using GNU chmod first). If the output is to something that is not a regular file (a named pipe, standard output etc.), the data is outputted with cat.

If no file was given on the command line, the collected data is sent to standard output.

At the end, the temporary file is removed.

Each step in the function relies on the successful completion of the previous step. No attempt is made to remove the temporary file if one command fails (it may contain important data).

If the named file does not exist, then it will be created with the user's default permissions etc., and the data arriving from standard input will be written to it.

The mktemp utility is not standard, but it is commonly available.

The above function mimics the behaviour described in the manual for sponge from the moreutils package on Debian.

Using tee in place of sponge would not be a viable option. You say that you've tried it and it seemed to work for you. It may work and it may not. It relies on the timing of when the commands in the pipeline are started (they are started pretty much concurrently), and the size of the input data file.

The following is an example showing a situation where using tee would not work.

The original file is 200000 bytes, but after the pipeline, it's truncated to 32 KiB (which could well correspond to some buffer size on my system).

$ yes | head -n 100000 >hello
$ ls -l hello
-rw-r--r--  1 kk  wheel  200000 Jan 10 09:45 hello

$ cat hello | tee hello >/dev/null
$ ls -l hello
-rw-r--r--  1 kk  wheel  32768 Jan 10 09:46 hello

https://manpages.debian.org/unstable/moreutils/sponge.1.en.html might help. — JdeBP, Jan 10 '20 at 09:49
Unlike the real sponge, your function will leave the original file in a messed up state if the cat fails midway. That's a total FAIL, sorry. If atomicity && data integrity don't matter, but preserving the original inode is essential, there are smarter and faster ways to do it than copying the whole file once (like sponge) or twice (like your function). — , Feb 05 '20 at 09:05
@mosvy Are you talking about the first cat, to $tmpfile? If that cat fails, the original file will not be messed up. Could you possibly explain what types of failure conditions you're seeing and how I may improve the code to cope with these? — Kusalananda, Feb 05 '20 at 09:25
No, I'm talking about the second: cat "$tmpfile" >"$outfile". What's the user supposed to do if that fails midway? Go hunt through the temporary files (wherever her OS may hide them until she finds her precious data? — , Feb 05 '20 at 09:28
@mosvy What I could do is to do as sponge actually does, and use mv to move the file in place if it's a regular file (the sponge utility uses rename(2)). The sponge utility falls back to using a read+write loop if the output is not a regular file (see copy_tmpfile() in the sponge.c source). I'll amend the answer to incorporate this, which by necessity would have to also include creating the temporary file on the same filesystem as the original (sponge.c attempts a rename(2) and then does a slow copy if that fails due to $TMPDIR being on another fs). — Kusalananda, Feb 05 '20 at 09:59

score 1 · Answer 2 · answered Jan 10 '20 at 06:58

1

There's this short bash script, which requires Perl
https://github.com/ildar-shaimordanov/perl-utils#sponge

The second script should be a drop-in replacement for the version in moreutils

There's also a version that is a stand-alone perl script.

answered Jan 10 '20 at 06:58

marinara

364

2

Note that the in-line Perl script employed in the linked-to code will buffer all data in memory. In situations where the data is huge, this may not be a viable option. – Kusalananda Jan 10 '20 at 09:36

score 0 · Answer 3 · answered Jan 10 '20 at 09:42

function wf() {
    #create a temporary file
    local tmpf="${1}_$(< /dev/urandom tr -dc A-Za-z0-9 | head -c16)"
    #redirect the result
    cat > $tmpf
    #replace the original file
    mv -f $tmpf "${1}"
}

next we use the function

grep "error" messages.log | wf messages.log

Mario Palumbo · Answer 4 · 2023-10-20T13:31:44.663

-1

Why take cannons to shoot flies? A possible solution is this:

stdin-to-file () {
local function_name="${FUNCNAME[0]}"
local tmp_file
local append=false
local exit_code=0
for (( i=1; i<=$#; i++ )); do
    if [[ ${!i} = -- ]]; then
        set -- "${@:1:i-1}" "${@:i+1}"
        break
    fi
    if [[ ${!i} = -a || ( --append = ${!i}* && $(expr length "${!i}") -ge 3 ) ]]; then
        append=true
        set -- "${@:1:i-1}" "${@:i+1}"
        ((i--))
        continue
    fi
done
if [[ $# -ne 1 || -t 0 ]]; then
    echo "$function_name: Wrong number of arguments or missing stdin." >&2
    return 1
fi
tmp_file="$(mktemp "/tmp/$(basename -- "$1")-XXXXXXXXXXXX")" &&
cat > "$tmp_file" &&
if $append; then
    cat "$tmp_file" >> "$1"
else
    cat "$tmp_file" > "$1"
fi ||
exit_code=$?
rm -f -- "$tmp_file"
if [[ $exit_code != 0 ]]; then
    echo "$function_name: An error has occurred." >&2
fi
return $exit_code
}

Then:

cat file | command | stdin-to-file file

For append:

cat file | command | stdin-to-file -a file

Or:

cat file | command | stdin-to-file --append file

edited Oct 20 '23 at 13:31

answered Apr 26 '23 at 13:11

Mario Palumbo

233
1
14

2

This will break with larger files. Consider a) seq 100000 | awk '{printf "Line %s\n", $1}' >file then b) cat file | awk '{$1=$1 "_a"}1' | tee file > /dev/null On my sys, tee gives up after 14440 lines... – drewk Apr 26 '23 at 13:25
I confirm this phenomenon. Thanks for pointing it out – Mario Palumbo Apr 26 '23 at 14:25
Would be nice to update the answer to say it only works for small chunks, or even better, research and put real numbers there with code references... or just delete the answer. – Robert Cutajar Sep 07 '23 at 18:50
@RobertCutajar I'm not very deep into the site's culture myself, but it is my understanding that deleting answers is discouraged. Sometimes even bad (ie downvoted) answers can be useful to others if they show how not to solve a problem. Editing it might be worthwhile. Anyone (above a certain threshold of reputation points??) can edit an answer so feel free to do it yourself if the author doesn't. Edits are reviewed so there is little danger of messing something up. – ibonyun Oct 05 '23 at 16:44
Ok, I have edited the answer – Mario Palumbo Oct 20 '23 at 13:38

Is there a standard alternative to sponge to pipe a file into itself?

4 Answers4

Linked