12

sort -o seems superfluous. What is the point of using it when we can use sort >?

Is it sometimes impossible to use shell redirection?

Kusalananda
  • 333,661
EmmaV
  • 4,067

2 Answers2

22

Sort a file in-place:

sort -o file file

Using sort file >file would start by truncating the file called file to zero size, then calling sort with that empty file, resulting in an empty output file no matter what the original file's contents was.

Also, in situations where commands or lists of options are automatically generated by e.g. scripts, adding -o somefile to the end of the options would override any previously set output file, which allows controlling the output file location by way of appending options.

sort_opt=( some list of options )

if [ ... something ... ]; then # We don't need to go through and delete any old use of "-o" # because this later option would override it. sort_opt+=( -o somefile.out ) fi

sort "${sort_opt[@]}" "$thefile"

There might also be instances where the sort binary executable is called directly, without a shell to do any redirection to any file.

Note that -o is a standard option whereas --output is a GNU extension.

Kusalananda
  • 333,661
  • 1
    Is this the only reason? If so I would have expected it to be in the man page. – EmmaV Aug 19 '22 at 09:21
  • @EmmaV The manual describes the tool but seldom says very much about how to apply the tool. – Kusalananda Aug 19 '22 at 09:22
  • 3
    @EmmaV the GNU docs say "Write output to output-file instead of standard output. Normally, sort reads all input before opening output-file, so you can sort a file in place by using commands like sort -o F F and cat F | sort -o F." – muru Aug 19 '22 at 09:23
  • @EmmaV See updated answer. – Kusalananda Aug 19 '22 at 09:34
  • Most of those feel like straw man arguments to me, where they could equally be applied to any command at all. The best one is that -o can be safely used to update a file in place - and that really isn't documented in the man page like the corresponding writeup for sed -i does, "edit files in place [...]" – Chris Davies Aug 19 '22 at 09:50
  • @roaima Isn't it interesting that neither POSIX nor any sort manual says -o is for "sorting a file in-place"? To me, this means it's a bit more versatile than that, which I think I show in the answer. – Kusalananda Aug 19 '22 at 09:59
  • 7
    The POSIX text at least hints at it: "-o output: Specify the name of an output file to be used instead of the standard output. This file can be the same as one of the input files." But then again, it's not like all options of all tools have that one determined use-case you're supposed to use them for. sort needs to buffer everything in memory anyway, so having an the output to open an output file only after reading all input comes pretty much for free. Unlike with sed or cat, where doing that requires adding that buffering. – ilkkachu Aug 19 '22 at 10:32
  • 1
    Also sed -i doesn't even do the same thing: it creates a new file and renames it in place of the original. – ilkkachu Aug 19 '22 at 10:33
  • 8
    Note that the "> is a shell construct. Why would you want to start a shell if you just want to run a command to sort something into a file" is kind of the winning argument to me. It's portable, easy, and sure, it's for some use cases a redundant thing, but honestly, multiple easy-to-understand ways of doing the same thing aren't a bad thing per se. – Marcus Müller Aug 19 '22 at 10:57
  • @roaima Wrong man page. Check the original (or if you want it to be spelled explicitly, the next version). – Gilles 'SO- stop being evil' Aug 19 '22 at 19:00
  • @Gilles'SO-stopbeingevil' that's an interesting functionality jump from v2 to v3: sort [input [output]] with a missing output implying sort-in-place. And then contrast with current implementations where sort input writes to stdout. – Chris Davies Aug 19 '22 at 22:55
  • in many instances where sort is called directly the caller can initialise stdout how they want before exec (although popen() in write mode doesn't give that option) – Jasen Aug 21 '22 at 06:47
13

The main purpose is to be able to sort a file in place. You can't do that with redirection because sort myfile >myfile would truncate myfile before sort is even started.

The reason this is particularly useful with sort specifically is that sort implementations have traditionally been able to process large files (potentially much larger than RAM), and have used temporary files on disk to do so. The other traditional text processing utilities are, on the contrary, designed primarily to process files as a stream, line by line, and they support large files because they never need to keep more than a few lines in memory at a time.

This ability was introduced as early as Second Edition Unix. This version documents that it copies the input to a temporary file. The reason is that it can sort large files for which RAM would be insufficient. The Unix sort utility was written with large files in mind almost from the get go.

The Second Edition manual doesn't say explicitly that the input file can be the same as the output file, and I can't find the source code to confirm, but that would follow from the implementation technique. (In First Edition, the manual mentions nothing special; I can't find the source either but the binary does not produce correct output in that case.) By the way, in these early versions, the syntax is sort input output, not yet sort -o output input, because the syntax of command line options wasn't firmly established yet.

In Third Edition Unix, the ability to sort in place is explicitly mentioned: sort myfile replaces myfile with a sorted version. In Fourth Edition Unix, the syntax changes slightly: sort myfile writes the sorted output to standard output, but sort myfile myfile sorts the file in place (the manual explicitly states that “The input and output file may be the same.”). In Fifth Edition Unix, the syntax to specify the output file is the modern -o option.

Fifth Edition is the first to support merge mode (sort -m), and also the first to support multiple input files. This is related: merge mode is only useful with multiple inputs, whereas sorting multiple files together is a rare need and can be accomplished with cat … | sort with little loss of performance (since sort writes the data to a temporary file anyway). In merge mode, sort can't write to one of the input files, because in this mode it reads all the inputs one line at a time and writes the output progressively, it doesn't make a copy.

The ability to write to the input file (except in merge mode) has been retained throughout the history of Unix and (most) clones, and is required by POSIX.