Delete duplicate lines in file without creating new file in ubuntu

Question

I can't seem to find a command that lets me delete duplicates in my file without creating a new file and also preserving the order of the contents in my file.

Would there be another command besides uniq and awk?

If not, I know that sed contains an in place option. I just don't know how to use it with deleting duplicates.

with duplicates sample.txt

deleted duplicates sample.txt

If you're using a recent version of Ubuntu, you likely have GNU awk (gawk) by default - which does have a -i inplace option — steeldriver, May 05 '20 at 19:18
Something to note is that any program performing this action will use a temporary file under the hood and then replace the target file once it's done. Using uniq, redirecting to a temp file, then moving that file to replace the target isn't necessarily a bad thing. Invoking awk or sed might make things more complicated when you come back to the command later. — John Moon, May 05 '20 at 19:28

Kusalananda · Answer 1 · 2020-12-16T10:37:00.867

Using GNU awk specifically, and its recently gained ability to do in-place edits,

$ cat file
1
2
1
3
4
1

$ awk -i inplace '!seen[$0]++' file

$ cat file
1
2
3
4

Note that as with most tools that do "in-place edits" (e.g. sed -i), this uses a temporary file to perform the editing. You do however not have to manually move files about.

See also:

score 0 · Answer 2 · answered May 05 '20 at 22:40

At the risk of corrupting the file if the command is aborted:

{ awk '!seen[$0]++'; 
 python -c 'import sys; sys.stdout.truncate(sys.stdout.tell())'; } <sample.txt 1<>sample.txt

We create a grouping of commands within the curly braces, redirecting standard input for the group to sample.text while also opening sample.txt in read+write mode sans truncation via the 1<>sample.txt .

awk '!seen[$0]++' is the awk idiom for removing duplicates whilst preserving order

When the awk command ends, the file descriptor corresponding to the standard output is positioned at some intermediate location within sample.txt and sample.txt needs to be truncated down to this location. There are two ways I can think of to accomplish this

python -c 'import sys; sys.stdout.truncate(sys.stdout.tell())' is an option if python is installed
On a GNU Linux system truncate -s "$(awk '/^pos:/{print $2}' /proc/$$/fdinfo/1)" sample.txt

score -1 · Answer 3 · answered May 06 '20 at 00:51

-1

It is possible to use sort with -o

sort -u sample.txt -o sample.txt

-u is for unique

-o is for output filename

answered May 06 '20 at 00:51

GMaster

6,322

sort -u is not guaranteed to preserve original order which is an OP requirement – iruvar May 06 '20 at 01:25

Delete duplicate lines in file without creating new file in ubuntu

3 Answers3