3

I can't seem to find a command that lets me delete duplicates in my file without creating a new file and also preserving the order of the contents in my file.

Would there be another command besides uniq and awk?

If not, I know that sed contains an in place option. I just don't know how to use it with deleting duplicates.

with duplicates sample.txt

1
2
1
3
4
1

deleted duplicates sample.txt

1
2
3
4
Kusalananda
  • 333,661
Nathan
  • 39
  • 5
  • If you're using a recent version of Ubuntu, you likely have GNU awk (gawk) by default - which does have a -i inplace option – steeldriver May 05 '20 at 19:18
  • 1
    Something to note is that any program performing this action will use a temporary file under the hood and then replace the target file once it's done. Using uniq, redirecting to a temp file, then moving that file to replace the target isn't necessarily a bad thing. Invoking awk or sed might make things more complicated when you come back to the command later. – John Moon May 05 '20 at 19:28

3 Answers3

1

Using GNU awk specifically, and its recently gained ability to do in-place edits,

$ cat file
1
2
1
3
4
1
$ awk -i inplace '!seen[$0]++' file
$ cat file
1
2
3
4

Note that as with most tools that do "in-place edits" (e.g. sed -i), this uses a temporary file to perform the editing. You do however not have to manually move files about.

See also:

Kusalananda
  • 333,661
0

At the risk of corrupting the file if the command is aborted:

{ awk '!seen[$0]++'; 
 python -c 'import sys; sys.stdout.truncate(sys.stdout.tell())'; } <sample.txt 1<>sample.txt

We create a grouping of commands within the curly braces, redirecting standard input for the group to sample.text while also opening sample.txt in read+write mode sans truncation via the 1<>sample.txt .

awk '!seen[$0]++' is the awk idiom for removing duplicates whilst preserving order

When the awk command ends, the file descriptor corresponding to the standard output is positioned at some intermediate location within sample.txt and sample.txt needs to be truncated down to this location. There are two ways I can think of to accomplish this

  1. python -c 'import sys; sys.stdout.truncate(sys.stdout.tell())' is an option if python is installed

  2. On a GNU Linux system truncate -s "$(awk '/^pos:/{print $2}' /proc/$$/fdinfo/1)" sample.txt

iruvar
  • 16,725
-1

It is possible to use sort with -o

sort -u sample.txt -o sample.txt

-u is for unique

-o is for output filename

GMaster
  • 6,322