27

One might think that

echo foo >a
cat a | rev >a

would leave a containing oof; but instead it is left empty.

  1. Why?
  2. How would one otherwise apply rev to a?
Toothrot
  • 3,435

7 Answers7

35

There's an app for that! The sponge command from moreutils is designed for precisely this. If you are running Linux, it is likely already installed, if not search your operating system's repositories for sponge or moreutils. Then, you can do:

echo foo >a
cat a | rev | sponge a

Or, avoiding the UUoC:

rev a | sponge a

The reason for this behavior is down to the order in which your commands are run. The > a is actually the very first thing executed and > file empties the file. For example:

$ echo "foo" > file
$ cat file
foo
$ > file
$ cat file
$

So, when you run cat a | rev >a what actually happens is that the > a is run first, emptying the file, so when the cat a is executed the file is already empty. This is precisely why sponge was written (from man sponge, emphasis mine):

sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before writing the output file. This allows constructing pipelines that read from and write to the same file.

terdon
  • 242,166
  • "Unlike a shell redirect, sponge soaks up all its input before writing the output file" -- does this mean that I'll run out of memory if the output is large enough? – ivan_pozdeev Dec 26 '19 at 14:59
  • @ivan_pozdeev I doubt it. I don't know the details, but presumably it will use some sort of buffering to avoid that. – terdon Dec 26 '19 at 15:24
  • 6
    @ivan_pozdeev Looking at the source code, sponge writes the output to a temporary file and then moves it into the file that you specified, so it doesn't actually store any more data in memory than the size of the buffer is uses for writing. – Nonny Moose Dec 26 '19 at 20:26
  • 1
    @NonnyMoose So then, how does sponge differ from the standard way of doing this, namely rev < a > $$; mv $$ a ? – Jim L. Dec 27 '19 at 18:22
  • 2
    @JimL. That is a good question in itself. It differs in multiple ways: You can interrupt that command between the two parts, and end up with two files. Also, it can run out of disk space, because the temporary file is on the same file system, instead of in TMPDIR. And it can overwrite a file, because it does not use a name as generated by tempfile. And it contains two references by name to the same file, which allows for differences by typos. (It requires to have reference access to the name, but that is the case here). And, most importantly, it represents an abstraction. – Volker Siegel Dec 27 '19 at 19:39
  • @NonnyMoose Yes, I mean the alternative command does not keep them. (I refined the comment) – Volker Siegel Dec 27 '19 at 19:53
  • @JimL. And the alternative command looses the file permissions. – Volker Siegel Dec 27 '19 at 19:54
11
  1. The output truncation is done very early, so cat sees an empty file.
  2. Either the first file is constructed as a temporary, or the output of rev is directed to a temporary which you then rename.
9

another way to fix this is to use a writing method that does not truncate

  rev a | dd conv=notrunc of=a

this only works because:

  1. rev reads content before producing output and the output is never longer than the amount already read

  2. the new file content is same size or larger than the original (in this case same size)

  3. dd opens the file to write without truncating it.

This approach may be useful for in-place modification of files too large to keep temporary copies of.

Jasen
  • 3,761
4
cat a | rev > a

Why [is a left empty]?

In the pipeline above, the shell forks two subprocesses, one for each of the two parts of the pipeline. Those subprocesses then run the commands in question, first processing any redirections, then calling one of the exec*() functions to start the external utility. The subprocesses run in parallel, and there are no timing guarantees between them.

Exec'ing a process isn't very fast, so usually what happens is that the shell on the right-hand side manages to set up the redirection before cat has a chance to read the file. The output redirection > a truncates the file, so cat has nothing to read, rev receives no data and produces no data. Even if you used a redirection the left-hand side too (cat < a | rev > a), a might get opened for reading before it was truncated, but cat still probably wouldn't have time to actually read it before that.

On the other hand, this quite consistently prints a contains: foo on my system:

echo foo > a; cat < a | tee a > /dev/null ; echo "a contains: $(cat a)"

Here, it's tee that truncates the file, so this happens after the exec() and cat has a better chance of having time to read the file. But if the file was large enough, it might get truncated in the middle of reading it.

I said might and probably there, because indeed the exact opposite can happen, if the OS decides to schedule the processes in another fashion.

How would one otherwise apply rev to a?

The usual solution is to use a temporary file:

cat a | rev > b && mv b a

Though there's the usual problem of possibly overwriting an existing file, unless you can be sure that the temporary file name is available. You should probably use mktemp:

f=$(mktemp ./tmp.XXXXXX)
cat a | rev > "$f" && mv "$f" a || rm "$f"

Alternatively, you can use the sponge tool, which makes sure to read all of the input it gets before opening the output file (otherwise it's like cat):

cat a | rev | sponge a

or just

rev < a | sponge a

sponge > a would be a mistake for the same reason the original command doesn't work.


Sponge is from moreutils, and not a standard tool. Some alternatives for it are listed in Completely buffer command output before piping to another command?

Some utilities may implement a similar feature themselves, e.g. sort -o outputfile only opens the output file after finishing, see Does sort support sorting a file in-place, like `sed --in-place`?

ilkkachu
  • 138,973
1

>file create a new empty file or truncates an existing file. As such, there's nothing left in the file for rev to read.


As other answers have mentioned, you could use sponge for this. But sponge is not available to everyone.

The following is a generic solution that involve just the shell:

exec 3<file; rm file; rev <&3 >file; exec 3<&-

This opens the file (as fd 3) and deletes it. More precisely, this only deletes the directory entry for the file, not the file itself. The file won't be deleted until all hardlinks to it are deleted and all handles to it are closed.

Next, rev is run reading from the "deleted file". It's output is sent to a new file. While this new file has the same name as the original file, it's a different file. As such, there's no conflict.

Finally, we close the descriptor to the original file, allowing it to be freed.


The problem with the above approach is that the data is lost if a problem occurs. That is why one might prefer the following (which uses no more disk space than the above):

( rev file >file.new && mv file.new file ) || rm file.new
ikegami
  • 145
1

The reasons have been explained by others but basically the same reason why you can't do:

rev a >a

But to how one might accomplish this:

echo foo >a
echo `cat a | rev` >a
# or
echo `rev a` >a
tinnick
  • 300
  • 2
  • 10
0

You can use Vim in Ex mode:

ex -s -c '%!rev' -c x a.txt
  • % select all lines
  • ! run command
  • x save and close
Zombo
  • 1
  • 5
  • 44
  • 63