3

I have two files on my Linux machine. The first "list.txt" contains a list of objects (2649 objects) while the second "list_interactors.txt" contains a shorter list with some of the objects in the previously list (719 objects) and for each of these there are in other columns some variables associated. I would like to obain a list of all the objects (2649) with the associated variable for the specific objects in file "list_interactors".

Example:

file list.txt

6tyr_A_002__________
7yer_2_009__________
3erf_1_001__________
2dr5_D_2-3__________

file list_interactors.txt

6tyr_A_002__________    6tyr1_B    QRT54R   AAAAA
3erf_1_001__________    3erf2_B    QAEF6R   XXXXX

output.txt

6tyr_A_002__________    6tyr1_B    QRT54R   AAAAA
7yer_2_009__________
3erf_1_001__________    3erf2_B    QAEF6R   XXXXX
2dr5_D_2-3__________

I'm not very pratical of the programming languages. I try to use the function grep with this script:

grep -f list.txt list_interactors.txt

but the output is a file like the file "list_interactors.txt".

Could you help me please?

terdon
  • 242,166
Tommaso
  • 167
  • 1
  • 9
  • 1
    Probably the tool you are looking for is join, not grep. Check the man page – Francesco May 26 '20 at 08:37
  • 1
    The behavior of grep you see is because the -f option takes matching rules (=filtering rules) from the file. In the end, your command says "print all lines in list_interactors.txt that contain one of the strings in list.txt (which in your case is every line in list_interactors.txt). – AdminBee May 26 '20 at 09:13

4 Answers4

12
$ join -a 1  <( sort list.txt ) <( sort list_interactors.txt )
2dr5_D_2-3__________
3erf_1_001__________ 3erf2_B QAEF6R XXXXX
6tyr_A_002__________ 6tyr1_B QRT54R AAAAA
7yer_2_009__________

This uses join to do a relational JOIN operation between the two files. The first field will be used as the join key by default.

The -a 1 option makes join output all lines in the first file, even if there is no match in the second file (it does a "left join").

The input data to join needs to be sorted, and we do this by calling sort on each file individually in two process substitutions on the command line. You could also opt for pre-sorting the files.

If your data is tab-delimited, you may want to add -t $'\t' to the start of the join command's arguments. This would make the output retain the existing tab delimiters.

Redirect the output by adding >output.txt to the end of the command if you want to store it in a file.

Kusalananda
  • 333,661
5

If you want to keep the sorting you can use awk:

awk '
    FNR==NR {s[$1]=$0}
    FNR!=NR {if(s[$1]) print s[$1]; else print $0}
' list_interactors.txt list.txt

Output:

6tyr_A_002__________    6tyr1_B    QRT54R   AAAAA
7yer_2_009__________
3erf_1_001__________    3erf2_B    QAEF6R   XXXXX
2dr5_D_2-3__________
pLumo
  • 22,565
1
$ awk 'NR==FNR{a[$1]=$0; next} {print ($1 in a ? a[$1] : $0)}' list_interactors.txt list.txt
6tyr_A_002__________    6tyr1_B    QRT54R   AAAAA
7yer_2_009__________
3erf_1_001__________    3erf2_B    QAEF6R   XXXXX
2dr5_D_2-3__________
Ed Morton
  • 31,617
1

Perl one liner can also do :

$ perl -ane ' { chomp;$s{$F[0]}=$_; } END { print "$s{$_}\n" for sort(keys(%s))  }' list.txt list_interactors.txt 
2dr5_D_2-3__________
3erf_1_001__________    3erf2_B    QAEF6R   XXXXX
6tyr_A_002__________    6tyr1_B    QRT54R   AAAAA
7yer_2_009__________