3

Considering this Q&A about the order of execution in the shell as it relates to redirection, and despite the fact that if a file doesn't exist it gets created first so that cat example.txt | shuf > example.txt doesn't complain that the file doesn't exist - which just confirms the order from what I understand - then why is it that about once every thousand times on my system the shuffle works when I do this (backup contains 15 static values each on their line)

for i in $(seq 1 1000); do
    cp backup test
    echo $i
    cat test | shuf > test
    cat test
done

How can there seemingly be an exception to the rule?

  • @jofel The Q&A you refer to and which I link to is about the order of execution. This question is about why that order doesn't seem to be respected "sometimes". Answering to the other Q doesn't answer this. Should I have requested an edit? –  Jan 23 '14 at 09:46
  • The problem with that other answer is that it describes what happens in a particular case, without explaining that the order of execution in a pipe is actually indeterminate -- i.e. that particular case is not the only possible case. +1 for experimental tenacity ;) – goldilocks Jan 23 '14 at 10:02
  • @jofel When I asked the Q, I was certain that the rule was that redirection was happening first, whereas OP in linked Q thought the opposite i.e. that this wasn't the case. Plus his second premiss was the least probable result from the indeterminate expression. Nevertheless we were both wrong. Now that I know this, I don't mind suggesting an edit to OP to incorporate my Q ie. "this other contributor tried this and got that, what gives?" or something like that. If goldilocks copies his A there too we can have the all encompassing Q. Just suggesting if there's value in doing that, np. –  Jan 24 '14 at 02:00

1 Answers1

4

Here:

cat test | shuf > test

> takes precedence in the sense that this means:

(cat test) | (shuf > test) 

and not:

(cat test | shuf) > test

Although if we used two different files, there wouldn't be any difference between these two groupings anyway.

What becomes significant when you use the same file is that the two commands on either side of the pipe execute concurrently. Those commands are cat test and shuf > test. > means "open for writing and truncate", whereas cat is going to "open for reading", read, and close. Since those two things are happening concurrently, the relationship between the order of their collective operations is indeterminate. There's a chance that cat will get to slurp the file in before the shell managing shuf > test truncates it. But that's a slim chance, since there's more involved; it will only happen if cat is very lucky in relation to the scheduling of shuf > test.

Moral of story: Don't do this -- use two files instead.

goldilocks
  • 87,661
  • 30
  • 204
  • 262
  • I got it now, thank you, I thought the file opening before everything else was a sure thing, but there's more to it than that, as I didn't perceive that this was indeterminate so this becomes about race conditions I think. Don't know where this leads. Maybe one day we'll see :) –  Jan 23 '14 at 10:20
  • 1
    Yep, that's a race condition. – goldilocks Jan 23 '14 at 10:23
  • 1
    An alternative is to use the tool sponge. – jofel Jan 23 '14 at 12:37
  • 1
    Note that shuf test 1<> test is likely to be OK though, as is shuf -o test test as is sort -Ro test test, since sort or shuf need to read the file fully before starting to write. And will only open the file for writing after finishing reading. – Stéphane Chazelas Jan 23 '14 at 23:42