How to compare two files, so that the information in the first one is deleted from the second one?

Question

Basically, I had a file, which contained around 90 usernames.

I had to delete all of those usernames from the passwd file, which was a total of 300 lines long.

I tried to come up with a way to at least filter the duplicate usernames and print out the ones, which have to remain but had no sucess.

Let's say that the file, where I had the 90 usernames for removal contains:

file.txt

user1
user2
user3
user4

The passwd file contains these usernames, along with a lot more

passwd

user31
user32
user1
user23
user2
user4
user15
user3

The usernames inside the passwd file were randomly spread inside the file, so diff -y wouldn't have done the job.

My goal here was to compare the two file, for example cat the file.txt and use the output to search inside the passwd. The result should be either a removal of the duplicate lines, or a printing of the unique lines.

Please provide example input and expected output, and at least show what you have tried and explain how it did not work as expected or intended. — DopeGhoti, Jan 17 '18 at 19:27

score 6 · Answer 1 · edited Jan 19 '18 at 10:00

6

I think the easiest way may be to do :

grep -v -x -f file_infotodelete reference_file  > result_file

edited Jan 19 '18 at 10:00

GAD3R

66,769

answered Jan 17 '18 at 19:34

francois P

1,219

Chris Davies · Answer 2 · 2018-01-19T09:59:06.063

6

The comm command can be used to compare and contrast two sorted files:

comm <(sort file.txt) <(cut -d: -f1 /etc/passwd | sort)

The first column contains lines from the first file that do not appear in the second file.
The second column contains lines from the second file that do not appear in the first file.
The third column contains lines that appear in both files.

You can omit one or two columns from the three-column output (comm -13 ... will omit columns 1 and 3, for example).

Not asked in your question, but requested in a follow-up command, is how to take a list of usernames and extract the corresponding entries from /etc/passwd:

( echo root; echo sys ) | sed -r 's!(.*)!^\1:!' | grep -f - /etc/passwd
root:x:0:0:root:/root:/bin/bash
sys:x:3:3:sys:/dev:/usr/sbin/nologin

edited Jan 19 '18 at 09:59

answered Jan 17 '18 at 21:03

Chris Davies

116,213
16
160
287

That did the job in separating the users that have to remain in the passwd file! Thank you!
However, considering that the passwd file does not contain raw usernames, but user:test:123:123 for example. How can I make sure that these remain in the output of the command. Match the duplicates by the usernames only, but still return the entire lines as output. Can this be achieved in any way?
– George.S Jan 19 '18 at 09:38
@GeorgeS this is another straightforward pipeline, but I've updated my question with an illustration for you. – Chris Davies Jan 19 '18 at 09:59

score 0 · Answer 3 · answered Jan 18 '18 at 02:48

0

Use below awk one liner to achieve.

Below command will delete file.txt content from file passwd. Tested it worked fine

awk 'NR==FNR{a[$1];next}!($1 in a){print $1}' file.txt passwd

answered Jan 18 '18 at 02:48

Praveen Kumar BS

5,211

How to compare two files, so that the information in the first one is deleted from the second one?

3 Answers3