sort -o
seems superfluous. What is the point of using it when we can use sort >
?
Is it sometimes impossible to use shell redirection?
sort -o
seems superfluous. What is the point of using it when we can use sort >
?
Is it sometimes impossible to use shell redirection?
Sort a file in-place:
sort -o file file
Using sort file >file
would start by truncating the file called file
to zero size, then calling sort
with that empty file, resulting in an empty output file no matter what the original file's contents was.
Also, in situations where commands or lists of options are automatically generated by e.g. scripts, adding -o somefile
to the end of the options would override any previously set output file, which allows controlling the output file location by way of appending options.
sort_opt=( some list of options )
if [ ... something ... ]; then
# We don't need to go through and delete any old use of "-o"
# because this later option would override it.
sort_opt+=( -o somefile.out )
fi
sort "${sort_opt[@]}" "$thefile"
There might also be instances where the sort
binary executable is called directly, without a shell to do any redirection to any file.
Note that -o
is a standard option whereas --output
is a GNU extension.
-o
can be safely used to update a file in place - and that really isn't documented in the man page like the corresponding writeup for sed -i
does, "edit files in place [...]"
– Chris Davies
Aug 19 '22 at 09:50
sort
manual says -o
is for "sorting a file in-place"? To me, this means it's a bit more versatile than that, which I think I show in the answer.
– Kusalananda
Aug 19 '22 at 09:59
sort
needs to buffer everything in memory anyway, so having an the output to open an output file only after reading all input comes pretty much for free. Unlike with sed
or cat
, where doing that requires adding that buffering.
– ilkkachu
Aug 19 '22 at 10:32
sed -i
doesn't even do the same thing: it creates a new file and renames it in place of the original.
– ilkkachu
Aug 19 '22 at 10:33
>
is a shell construct. Why would you want to start a shell if you just want to run a command to sort something into a file" is kind of the winning argument to me. It's portable, easy, and sure, it's for some use cases a redundant thing, but honestly, multiple easy-to-understand ways of doing the same thing aren't a bad thing per se.
– Marcus Müller
Aug 19 '22 at 10:57
sort [input [output]]
with a missing output implying sort-in-place. And then contrast with current implementations where sort input
writes to stdout.
– Chris Davies
Aug 19 '22 at 22:55
The main purpose is to be able to sort a file in place. You can't do that with redirection because sort myfile >myfile
would truncate myfile
before sort
is even started.
The reason this is particularly useful with sort
specifically is that sort
implementations have traditionally been able to process large files (potentially much larger than RAM), and have used temporary files on disk to do so. The other traditional text processing utilities are, on the contrary, designed primarily to process files as a stream, line by line, and they support large files because they never need to keep more than a few lines in memory at a time.
This ability was introduced as early as Second Edition Unix. This version documents that it copies the input to a temporary file. The reason is that it can sort large files for which RAM would be insufficient. The Unix sort
utility was written with large files in mind almost from the get go.
The Second Edition manual doesn't say explicitly that the input file can be the same as the output file, and I can't find the source code to confirm, but that would follow from the implementation technique. (In First Edition, the manual mentions nothing special; I can't find the source either but the binary does not produce correct output in that case.) By the way, in these early versions, the syntax is sort input output
, not yet sort -o output input
, because the syntax of command line options wasn't firmly established yet.
In Third Edition Unix, the ability to sort in place is explicitly mentioned: sort myfile
replaces myfile
with a sorted version. In Fourth Edition Unix, the syntax changes slightly: sort myfile
writes the sorted output to standard output, but sort myfile myfile
sorts the file in place (the manual explicitly states that “The input and output file may be the same.”). In Fifth Edition Unix, the syntax to specify the output file is the modern -o
option.
Fifth Edition is the first to support merge mode (sort -m
), and also the first to support multiple input files. This is related: merge mode is only useful with multiple inputs, whereas sorting multiple files together is a rare need and can be accomplished with cat … | sort
with little loss of performance (since sort
writes the data to a temporary file anyway). In merge mode, sort
can't write to one of the input files, because in this mode it reads all the inputs one line at a time and writes the output progressively, it doesn't make a copy.
The ability to write to the input file (except in merge mode) has been retained throughout the history of Unix and (most) clones, and is required by POSIX.