10

I know this is sort of a duplicate of another question (Why this sort command gives me an empty file?) but I wanted to expand on the question in response to the answers given.

The command

shuf example.txt > example.txt

Returns a blank file, because the shell truncates the file before shuffling it, leaving only a blank file to shuffle. However,

cat example.txt | shuf > example.txt

will produce a shuffled file as expected.

Why does the pipeline method work when the simple redirection doesn't? If the file is truncated before the commands are run, shouldn't the second method also leave an empty file?

toryan
  • 707
  • @Gilles The question is more specifically about why the command works in a different format, and has this point has not been answered yet – toryan Jan 23 '14 at 19:43

6 Answers6

22

The problem is that > example.txt starts writing to that file, before shuf example.txt starts reading it. So as there was no output yet, example.txt is empty, shuf reads an empty file, and as shuf makes no output in this case, the final result stays empty.

Your other command may suffer from the same issue. > example.txt may kill the file before cat example.txt starts reading it; it depends on the order the shell executes those things, and how long it takes cat to actually open the file.

To avoid such issues entirely, you could use shuf example.txt > example.txt.shuf && mv example.txt.shuf example.txt.

Or you could go with shuf example.txt --output=example.txt instead.

frostschutz
  • 48,978
  • 2
    > example.txt doesn't start writing to the file, it opens the file for writing with truncation. Compare with 1<> example.txt which opens it read-write without truncation. – Stéphane Chazelas Jan 24 '14 at 16:02
  • @StephaneChazelas, opening a file for writing with truncation is usually how you start writing to a file. It's quite hard to do without opening it. ;) – frostschutz Jan 24 '14 at 16:05
  • 2
    Your wording is confusing. > file doesn't write anything and the real problem is that it truncates. If it only opened for writing (as 1<> file does), there wouldn't be any problem (at least not with shuf). – Stéphane Chazelas Jan 24 '14 at 16:10
  • If > file didn't write anything, it would work on a read-only filesystem. It only doesn't write anything when you take it literally (as in the syscall write()). You're riding on technicalities there that IMHO are way more confusing than my simplified explanation. – frostschutz Jan 24 '14 at 16:28
  • shuf 1<> file < file is deceptively neat. It works because shuf outputs the same amount of data, so all of file is overwritten. It would not work with, say, wc 1<> file < file. This, however, would: (rm file; wc > file) < file – Ole Tange Aug 13 '17 at 08:39
10

The package moreutils has a command sponge:

   sponge  reads  standard input and writes it out to the specified file.
   Unlike a shell redirect, sponge soaks up all its input before  opening
   the output file. This allows constricting pipelines that read from and
   write to the same file.

with that way you can do:

shuf example.txt | sponge example.txt

(unfortunately the moreutils package also has a util named parallel that is far less useful than gnu parallel. I removed the parallel installed by moreutils)

xhienne
  • 17,793
  • 2
  • 53
  • 69
Timo
  • 6,332
2

You are just quite lucky running

cat example.txt | shuf > example.txt

doesn't empty example.txt like this command is doing.

shuf example.txt > example.txt

Redirections are performed by the shell before the commands are executed and pipeline components are executed concurrently.

Using the -o / --output option would be the best solution with shuf but if you like taking (very slight) risks, here is a non traditional way to avoid the processed file to be truncated before being read:

shuf example.txt | (sleep 1;rm example.txt;cat > example.txt)

and this simpler and faster one, thanks to Ole's suggestion:

(rm example.txt; shuf > example.txt) < example.txt
jlliagre
  • 61,204
  • 2
    This, of course, assumes that 1 second is long enough. It would better to instead use sponge or a temporary file. – Chris Down Jan 23 '14 at 07:22
  • 1
    @ChrisDown sponge or a temporary file would be definitely more secure, especially with a remote or non inode based file system but in usual cases, 1 second should be long enough for shuf to open the file unless this is a very loaded system. – jlliagre Jan 23 '14 at 07:39
  • 1
    Try: (rm foo; shuf > foo) < foo – Ole Tange Aug 13 '17 at 08:32
1

From the GNU bash manual (see also, for the details, section 3.7 Executing Commands):

3.1.1 Shell Operation

The following is a brief description of the shell’s operation when it reads and executes a command. Basically, the shell does the following:

  1. Reads its input from a file (see Shell Scripts), from a string supplied as an argument to the -c invocation option (see Invoking Bash), or from the user’s terminal.
  2. Breaks the input into words and operators, obeying the quoting rules described in Quoting. These tokens are separated by metacharacters. Alias expansion is performed by this step (see Aliases).
  3. Parses the tokens into simple and compound commands (see Shell Commands).
  4. Performs the various shell expansions (see Shell Expansions), breaking the expanded tokens into lists of filenames (see Filename Expansion) and commands and arguments
  5. Performs any necessary redirections (see Redirections) and removes the redirection operators and their operands from the argument list.
  6. Executes the command (see Executing Commands).
  7. Optionally waits for the command to complete and collects its exit status (see Exit Status).

Consider the situation where your file does not exist. Yet your second example will create the file most of the times. If the file had not been so created and it didn't exist, and redirection hadn't occurred, cat would complain there is no such file... Most of the time it won't. To reproduce the shuffle you got with that second command, I needed a great many tries. So indeed the second expression should leave an empty file most of the times.

0

You can use Vim in Ex mode:

ex -sc '%!shuf' -cx example.txt
  1. % select all lines

  2. ! run command

  3. x save and close

Zombo
  • 1
  • 5
  • 44
  • 63
  • Fails if example.txt does not end in newline. – Ole Tange Aug 13 '17 at 08:53
  • And it is dead slow. 175MB: time shuf ... = 6.2s, time ex ... > 20m. – Ole Tange Aug 13 '17 at 20:41
  • Well it does read everything into memory as it is a full editor (generally part of vim) So I would expect it to be just as slow as if your used vim! – anthony Oct 07 '22 at 05:05
  • On the other hand if you need to look at a later part of the file to modify something at the begining! Then it's probably better than trying to parse the file twice, and safer too. I have used this for password file modifications to insert new users into it while retaining UID order! It is also usefull for updating disk quota using "edquota". – anthony Oct 07 '22 at 05:08
0

The shortest version I have found:

(rm foo && shuf > foo) < foo

This opens the file, unlinks it, and then truncates the file. Thereby you avoid the truncated file before opening it, which is what you will normally see when redirecting output to the same file as you are reading from.

Ole Tange
  • 35,514
  • 2
    I'd do (rm foo&&shuf>foo)<foo. As removing foo could fail (that's the permissions to the current directory that matter) while > foo could succeed (as it's foo's permissions that matter) resulting in foo being truncated and the data lost. – Stéphane Chazelas Aug 13 '17 at 16:16
  • @StéphaneChazelas Good point – Ole Tange Aug 13 '17 at 17:56