0

I have a file named file1 with information like below:

TCONS_00000011  XLOC_000003     -       u       q1:MSTRG.39|MSTRG.39.9|4|0.000000|0.000000|0.000000|7468
TCONS_00000012  XLOC_000004     -       u       q1:MSTRG.41|MSTRG.41.1|2|0.000000|0.000000|0.000000|1270
TCONS_00000013  XLOC_000003     -       u       q1:MSTRG.39|MSTRG.39.10|2|0.000000|0.000000|0.000000|6835
TCONS_00000014  XLOC_000003     -       u       q1:MSTRG.39|MSTRG.39.11|2|0.000000|0.000000|0.000000|880
TCONS_00000015  XLOC_000003     -       u       q1:MSTRG.39|MSTRG.39.12|3|0.000000|0.000000|0.000000|377
TCONS_00000016  XLOC_000005     -       u       q1:MSTRG.2|MSTRG.2.1|1|0.000000|0.000000|0.000000|709
TCONS_00000017  XLOC_000006     -       u       q1:MSTRG.4|MSTRG.4.1|1|0.000000|0.000000|0.000000|343
TCONS_00000018  XLOC_000007     -       u       q1:MSTRG.40|MSTRG.40.1|7|0.000000|0.000000|0.000000|12112
TCONS_00000019  XLOC_000007     -       u       q1:MSTRG.40|MSTRG.40.2|2|0.000000|0.000000|0.000000|310
TCONS_00000020  XLOC_000007     -       u       q1:MSTRG.40|MSTRG.40.3|3|0.000000|0.000000|0.000000|538
TCONS_00000021  XLOC_000008     -       u       q1:MSTRG.42|MSTRG.42.1|9|0.000000|0.000000|0.000000|6331
TCONS_00000022  XLOC_000008     -       u       q1:MSTRG.42|MSTRG.42.2|5|0.000000|0.000000|0.000000|1311
TCONS_00000023  XLOC_000008     -       u       q1:MSTRG.42|MSTRG.42.3|5|0.000000|0.000000|0.000000|923
TCONS_00000024  XLOC_000008     -       u       q1:MSTRG.42|MSTRG.42.4|2|0.000000|0.000000|0.000000|455
TCONS_00000025  XLOC_000009     -       u       q1:MSTRG.7|MSTRG.7.1|1|0.000000|0.000000|0.000000|232
TCONS_00000026  XLOC_000010     -       u       q1:MSTRG.6|MSTRG.6.1|1|0.000000|0.000000|0.000000|483
TCONS_00000027  XLOC_000011     -       u       q1:MSTRG.12|MSTRG.12.1|2|0.000000|0.000000|0.000000|2489
TCONS_00000028  XLOC_000012     -       u       q1:MSTRG.14|MSTRG.14.1|1|0.000000|0.000000|0.000000|7604
TCONS_00000029  XLOC_000013     -       u       q1:MSTRG.55|MSTRG.55.1|4|0.000000|0.000000|0.000000|1511

And file2 is like below:

XLOC_000005
XLOC_000007
XLOC_000009
XLOC_000010
XLOC_000012

Based on information from file2 if it matches with second column in file1 I want to extract all information from file1. And the output should look like below:

TCONS_00000016  XLOC_000005     -       u       q1:MSTRG.2|MSTRG.2.1|1|0.000000|0.000000|0.000000|709
TCONS_00000018  XLOC_000007     -       u       q1:MSTRG.40|MSTRG.40.1|7|0.000000|0.000000|0.000000|12112
TCONS_00000019  XLOC_000007     -       u       q1:MSTRG.40|MSTRG.40.2|2|0.000000|0.000000|0.000000|310
TCONS_00000020  XLOC_000007     -       u       q1:MSTRG.40|MSTRG.40.3|3|0.000000|0.000000|0.000000|538
TCONS_00000025  XLOC_000009     -       u       q1:MSTRG.7|MSTRG.7.1|1|0.000000|0.000000|0.000000|232
TCONS_00000026  XLOC_000010     -       u       q1:MSTRG.6|MSTRG.6.1|1|0.000000|0.000000|0.000000|483
TCONS_00000028  XLOC_000012     -       u       q1:MSTRG.14|MSTRG.14.1|1|0.000000|0.000000|0.000000|7604

How can I do this linux?

maven
  • 11
  • @Quasímodo that's not a solution because it doesn't address partial matches and matches on a specific field. – Ed Morton Nov 11 '20 at 00:08
  • @EdMorton The -w flag of Grep covers partial matches. It does not address field matches, but as far as the sample is concerned, it is a solution. – Quasímodo Nov 11 '20 at 00:09
  • @Quasímodo best I can see none of the answers there mention -w. – Ed Morton Nov 11 '20 at 00:12
  • @EdMorton Very true, it is not even a POSIX flag. But applying a simple grep -f file2 file1 solves the problem at hand, where XLOC_... being present in any other field is out of the structure of the file, as well as partial matches. – Quasímodo Nov 11 '20 at 00:14
  • yes I actually tried this grep -w -f file2 file1 but didn't work – maven Nov 11 '20 at 00:17
  • 1
    Maven, it is a nice thing to mention your attempts in the question. It shows you have given the problem your try and it avoids contributors pointing ways that you know that fail. grep -w -f file2 file1 works perfectly for me with your sample input. Your problem is solved now, but for your future question, bear in mind "doesn't work" is not an error message. – Quasímodo Nov 11 '20 at 00:20

1 Answers1

1

This is probably what you want:

awk 'NR==FNR{a[$1]; next} $2 in a' file2 file1
Ed Morton
  • 31,617