32

From https://unix.stackexchange.com/a/458074/674

Remember to use -- when passing arbitrary arguments to commands (or use redirections where possible). So sort -- "$f1" or better sort < "$f1" instead of sort "$f1".

Why is it preferred to use -- and redirection?

Why is sort < "$f1" preferred over sort -- "$f1"?

Why is sort -- "$f1" preferred over sort "$f1"?

Thanks.

Kusalananda
  • 333,661
Tim
  • 101,790

2 Answers2

57
sort "$f1"

fails for values of $f1 that start with - or here for the case of sort some that start with + (can have severe consequences for a file called -o/etc/passwd for instance).

sort -- "$f1"

(where -- signals the end of options) addresses most of those issues but still fails for the file called - (which sort interprets as meaning its stdin instead).

sort < "$f1"

Doesn't have those issues.

Here, it's the shell that opens the file. It also means that if the file can't be opened, you'll also get a potentially more useful error message (for instance, most shells will indicate the line number in the script), and the error message will be consistent if you use redirections wherever possible to open files.

And in

sort < "$f1" > out

(contrary to sort -- "$f1" > out), if "$f1" can't be opened, out won't be created/truncated and sort not even run.

To clear some possible confusion (following comments below), that does not prevent the command from mmap()ing the file or lseek()ing inside it (not that sort does either) provided the file itself is seekable. The only difference is that the file is opened earlier and on file descriptor 0 by the shell as opposed to later by the command possibly on a different file descriptor. The command can still seek/mmap that fd 0 as it pleases. That is not to be confused with cat file | cmd where this time cmd's stdin is a pipe that cannot be mmaped/seeked.

  • 4
    Just remember that using a redirection forces sort to read the data sequentially and you cannot mmap the file. While sort may not have much problems with it, consider the performance of less <file and less file. In first case less has to keep the whole contents of the file in the memory, in the second case it's allowed to read only those parts it wants. Now imagine that file is a 100GB log file... – styrofoam fly Jul 25 '18 at 22:00
  • 9
    @styrofoamfly: It's correct that less <file keeps all the file in memory, but it is not forced to, this is a shortcoming of less. Only cat file | less is forced to. Check out less /dev/fd/0 <f, which doesn't keep the file in memory, even though it receives it on stdin. It's a common misconception that stdin in Unix is unseekable. It fact, it can be seekable, depending on the file type. – pts Jul 25 '18 at 22:22
  • @styrofoamfly Do you mean that read() read data sequentially from a file, while mmap() read the entire file into memory at once? – Tim Jul 26 '18 at 12:39
  • @Tim, read() reads (at the current offset, the amount specified), mmap() maps the file in memory, so that dereferencing the corresponding memory addresses may cause the data to be read from the file (if it hasn't been read already or has been evicted since it was already read, using a page fault mechanism). – Stéphane Chazelas Jul 26 '18 at 12:43
  • One should always remember that -- serving to mark the end of the option list is a GNU convention, not a guaranteed behavior. It is up to individual programs to honor it, and not all do. That's another reason to prefer redirection where that option is available. – John Bollinger Jul 26 '18 at 22:39
  • 1
    @JohnBollinger No. That dates back to at least as far back as getopt from SysIII in 1980 before the GNU project was started and is required to be supported for most standard utilities including sort by POSIX. But it's true that it's not always supported. – Stéphane Chazelas Jul 27 '18 at 05:22
  • 2
    My apologies, @StéphaneChazelas, you are right about the origin of the convention, and I will furthermore stipulate that the POSIX specification for the getopt() C function recognizes this significance of the argument --. But the main point is the one that you accept: argument handling is the domain of individual programs, and not all treat -- specially. – John Bollinger Jul 27 '18 at 12:56
18

The issue is file names beginning with a dash. sort "$f1" doesn't work if the value of f1 starts with - because the command will interpret the value as an option. This usually results in an error but it could even cause a security hole. With sort -- "$f1", the double dash argument -- mean “no options beyond this point” so the value of f1 will not be interpreted as an option. But there is still one edge case: if the value of f1 is a dash and nothing else, then it isn't an option, it's the argument -, which means “standard input” (because the argument is an input file; for an output file it would mean “standard output”).

Using redirection avoids all of these pitfalls.

This applies to most commands, not just sort.