15

I was trying to quickly edit an .hgignore file from the Cygwin bash shell today, and I added a line that was a mistake. I'm not sure if this was the best way to do it, but I quickly thought of using head -1 .hgignore to remove the offending line (I had previously only had one line in the file). Sure enough, when executed it gives the first line as the only output.

But when I tried to redirect the output and rewrite the file using head -1 .hgignore > .hgignore, the file was empty. Why does this happen? If I try appending instead, head -1 .hgignore >> .hgignore, it appends correctly but this is obviously not the desired result. Why does a truncating redirect not work in this case?

voithos
  • 340

6 Answers6

12

I think Bruce answers what's going on here with the shell pipeline.

One of my favorite little utilities is the sponge command from moreutils. It solves exactly this problem by "soaking" up all available input before it opens the target output file and writing the data. It allows you to write pipelines exactly how you expected to:

$ head -1 .hgignore | sponge .hgignore

The poor-man's solution is to pipe the output to a temporary file, then after the pipline is done (for example the next command you run) is to move the temp file back to the original file location.

$ head -1 .hgingore > .hgignore.tmp
$ mv .hgignore{.tmp,}
Caleb
  • 70,105
  • Looking at this a few years later, a thought occurred to me: couldn't we just do head -1 .hgignore | tee .hgignore? tee is in coreutils, and as a perk/side-effect, this also writes to STDOUT – voithos Mar 28 '14 at 14:51
  • 1
    @voithos To my knowledge tee opens and truncates the file it is writing to when it is instantiated just like everything else so it does not solve the main issue here of the race condition on reading the file contents before you truncate it with the write. – Caleb Mar 28 '14 at 17:37
  • You bring up a point that I wasn't aware of, actually - namely, that piped commands are started immediately, instead of sequentially. Is that accurate? I did, however, test it out and tee seems to do the desired thing. I've got version 8.13 on my machine. – voithos Mar 28 '14 at 17:47
  • 2
    @voithos Yes commands in a pipline and all the input/output channels involved are started in reverse order so the pipeline is ready to receive data when the first one starts giving it. I suspect your test is flawed because you probably used too small a chunk of data and it got the whole thing cached in a read buffer before you needed it. The tee program will truncate your files, it is not setup to double buffer them. – Caleb Mar 29 '14 at 11:32
11

When the shell gets a command line like: command > file.out the shell itself opens (and maybe creates) the file named file.out. The shell sets file descriptor 0 to the file file descriptor it got from the open. That's how I/O redirection works: every process knows about file descriptors 0, 1 and 2.

The hard part about this is how to open file.out. Most of the time, you want file.out opened for write at offset 0 (i.e. truncated) and this is what the shell did for you. It truncated .hgignore, opened it for write, dup'ed the filedescriptor to 0, then exec'ed head. Instant file clobbering.

In bash shell, you do a set noclobber to change this behavior.

  • Aha, I see. I did think that the shell was truncating the file before running the command, but I didn't know why. Thanks for the explanation! – voithos Jun 29 '11 at 20:18
3

In

head -n 1 file > file

file is truncated before head is started, but if you write it:

head -n 1 file 1<> file

it's not as file is opened in read-write mode. However, when head finishes writing, it doesn't truncate the file, so the line above would be a no-op (head would just rewrite the first line over itself and leave the other ones untouched).

However, after head has returned and while the fd is still open, you can call another command that does the truncate.

For instance:

{ head -n 1 file; perl -e 'truncate STDOUT, tell STDOUT'; } 1<> file

What matters here is that truncate above, head just moves the cursor for fd 1 inside the file just after the first line. It does rewrite the first line which we didn't need it to, but that's not harmful.

With a POSIX head, we could actually get away without rewriting that first line:

{ head -n 1 > /dev/null
  perl -e 'truncate STDIN, tell STDIN'
} <> file

Here, we're using the fact that head moves the cursor position in its stdin. While head would typically read its input by big chunks to improve performance, POSIX would require it (where possible) to seek back just after the first line if it had gone beyond it. Note however that not all implementations do it.

Alternatively, you can use the shell's read command instead in this case:

{ read -r dummy; perl -e 'truncate STDIN, tell STDIN'; } <> file
  • 1
    Stephane, do you know of a standard or coreutils command that can truncate STDIN similar to what you've accomplished using perl above – iruvar Aug 27 '15 at 14:11
  • 2
    @1_CR, no. dd can truncate at any arbitrary absolute offset in the file though. So you can determine the byte offset of the second line and truncate from there with dd bs=1 seek="$offset" of=file – Stéphane Chazelas Aug 27 '15 at 14:36
1

The Real Man's solution is

ed .hgignore
$d
wq

or as a one-liner

printf '%s\n' '$d' 'wq' | ed .hgignore

Or with GNU sed:

sed -i '$d' .hgignore

(No, I'm kidding. I'd use an interactive editor. vi .hgignore GddZZ)

1

You can use Vim in Ex mode:

ex -sc '2,d|x' .hgignore
  1. 2, select lines 2 until end

  2. d delete

  3. x save and close

Zombo
  • 1
  • 5
  • 44
  • 63
0

For in-place file editing you may also use the open file handle trick as shown by Jürgen Hötzel in Redirect output from sed 's/c/d/' myFile to myFile.

exec 3<.hgignore
rm .hgignore  # prevent open file from being truncated
head -1 <&3 > .hgignore

ls -l .hgignore  # note that permissions may have changed
dan55
  • 17
  • 2
    And just after rm .hgignore your power fails, taking away hours of hard work. Ok, it doesn't matter for .hgignore, but why would you do something that complicated anyway? Thus my downvote: technically correct but a very bad idea. – Gilles 'SO- stop being evil' Jun 30 '11 at 20:57
  • @Gilles, maybe not so good an idea, but that's for instance what perl -i (for inplace editing) does, and I wouldn't be surprised if some implementations of sed -i did it as well (though latest version of GNU sed seems not to). – Stéphane Chazelas Feb 19 '13 at 20:20