0

In this command, sort -u removed duplicates.

curl https://en.wikipedia.org/wiki/Help:Special_page -s | grep -oP 'Special:\K[a-zA-Z0-9]*' | sort -u > special_page_names

In this command, it didn't.

curl https://en.wikipedia.org/wiki/Help:Special_page -s | grep -oP 'Special:\K[a-zA-Z0-9]*' > special_page_names
sort -u special_page_names

Why did sort -u removed duplicates only in pipe?
i.e., why didn't it remove duplicates when ran on a regular file?

Lahor
  • 123
  • 1
    In the second case, did sort print something else than you expected? What exactly? Or did you expect the tool to modify special_page_names maybe? – Kamil Maciorowski Mar 12 '22 at 16:21
  • @KamilMaciorowski sort didn't print anything I didn't expect, it's just that it didn't modify the file, although I did expect it to modify the file. – Lahor Mar 12 '22 at 16:26
  • 1
    Please [edit] the question and put the clarification in the question body. – Kamil Maciorowski Mar 12 '22 at 16:46

1 Answers1

1

sort is a filter. It reads input, modifies data somehow and prints output. grep is also a filter.

Usually a filter works by reading its standard input and writing to its standard output.

In case of … | grep … | sort -u > special_page_names the standard input of sort comes from grep and the standard output of sort goes to special_page_names. You requested this by using | between grep and sort, and by using > special_page_names at the end.

The syntax sort -u special_page_names tells the tool to ignore its standard input (in your interactive shell this is the terminal, standard input inherited from the shell) and read special_page_names instead. The standard output is not redirected; it's the standard output inherited from the shell, usually the terminal, in your case the terminal. The data flew from special_page_names to your terminal.

If you want to save the output of sort -u special_page_names to a regular file then one way is to redirect the output of sort, like in the first case. Do not redirect back to special_page_names though; choose another file.

sort -u special_page_names > special_page_names_sorted

There are tools that can modify the file they read (e.g. text editors). There are filters with an option to overwrite the file they read (e.g. sed -i). You can make sort write to the same file by specifying the file both as an option-argument to -o and an operand:

sort -u -o special_page_names special_page_names