At the risk of corrupting the file if the command is aborted:
{ awk '!seen[$0]++';
python -c 'import sys; sys.stdout.truncate(sys.stdout.tell())'; } <sample.txt 1<>sample.txt
We create a grouping of commands within the curly braces, redirecting standard input for the group to sample.text
while also opening sample.txt
in read+write mode sans truncation via the 1<>sample.txt
.
awk '!seen[$0]++'
is the awk idiom for removing duplicates whilst preserving order
When the awk
command ends, the file descriptor corresponding to the standard output is positioned at some intermediate location within sample.txt
and sample.txt
needs to be truncated down to this location.
There are two ways I can think of to accomplish this
python -c 'import sys; sys.stdout.truncate(sys.stdout.tell())'
is an option if python
is installed
On a GNU Linux system truncate -s "$(awk '/^pos:/{print $2}' /proc/$$/fdinfo/1)" sample.txt
gawk
) by default - which does have a-i inplace
option – steeldriver May 05 '20 at 19:18uniq
, redirecting to a temp file, then moving that file to replace the target isn't necessarily a bad thing. Invokingawk
orsed
might make things more complicated when you come back to the command later. – John Moon May 05 '20 at 19:28