4

I want to remove duplicate lines from /etc/fstab, so I did this:

 awk '!NF || !seen[$0]++'   /etc/fstab > /etc/fstab.update

UUID=3de0d101-fba7-4d89-b038-58fe07295d96 /grid/sdb ext4 defaults,noatime 0 0 UUID=683ed0b3-51fe-4dc4-975e-d56c0bbaf0bc /grid/sdc ext4 defaults,noatime 0 0 UUID=1cf79946-0ba6-4cd8-baca-80c0a2693de1 /grid/sdd ext4 defaults,noatime 0 0 UUID=fa9cc6e8-4df8-4330-9144-ede46b94c49e /grid/sde ext4 defaults,noatime 0 0 UUID=3de0d101-fba7-4d89-b038-58fe07295d96 /grid/sdb ext4 defaults,noatime 0 0 UUID=683ed0b3-51fe-4dc4-975e-d56c0bbaf0bc /grid/sdc ext4 defaults,noatime 0 0

But as we can see, the last two lines are the same with the first two lines, but last two lines are with spaces.

Is it possible to ignore the space and remove the duplicate lines anyway?

yael
  • 13,106

3 Answers3

14

Force the rebuild of the record with $1=$1! This squeezes all contiguous spaces into a single one.

awk '{$1=$1};!seen[$0]++'
Quasímodo
  • 18,865
  • 4
  • 36
  • 73
  • Ah, no new-lines should be remain, where your solutions removes them (if line is empty) as well. "hello\n\nworld" should remain as "hello\n\nworld" but your solutions removes the empty line as well. I was just asking could we prevent the empty new line to be removed. – alper Jul 10 '21 at 19:29
  • 1
    @alper Well... That's an Awk script, the whole command-line to preserve blank lines is: awk 'NF{$1=$1};!NF||!seen[$0]++' filename. – Quasímodo Jul 12 '21 at 12:48
6

Use tr to replace tab with space and squeeze repeats (-s):

 tr -s $'\t' ' ' < /etc/fstab | awk '!NF || !seen[$0]++' > /etc/fstab.update
pLumo
  • 22,565
5

Use this Perl one-liner to treat any amount of whitespace as a single blank:

perl -lane 'print unless $seen{"@F"}++' in.txt > out.txt

If you want to ignore whitespace completely, use:

perl -lane '$s = join "", @F; print unless $seen{$s}++' in.txt > out.txt

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches