1

I would like to know the code for grep to check for two matches one after the other. For example I have the following text from one of the search files:

@<TRIPOS>MOLECULE   ← pattern
1532                ← ID
17 17
SMALL
NO_CHARGES

I need to find the exact match of the ID which is always located one line below pattern and then retrieve the file name from which ID was located.

I used the following command:

grep -Pzo '@<TRIPOS>MOLECULE'(?:.*\n)*?\K1532' filename

but I got files containing 1532 as well as for 153284. I need code for exact ID match.

1 Answers1

1

Your pattern is looking for 1532 but doesn't say anything about what comes afterwards:

$ printf '1532\n15321\n1532foo\n' | grep -o '1532'
1532
1532
1532

Depending on what you want to do, you can limit your pattern to only match before a newline:

grep -Pzo '@<TRIPOS>MOLECULE(?:.*\n)*?\K1532\n' filename

Or, if there can be whitespace after the number before the end of the line:

grep -Pzo '@<TRIPOS>MOLECULE(?:.*\n)*?\K1532\s*\n' filename

Alternatively, if you can have other things on the same line, use \b to make sure the number occurs before a word boundary:

grep -Pzo '@<TRIPOS>MOLECULE(?:.*\n)*?\K1532\b' filename
terdon
  • 242,166
  • Thank you so much for the below code. It served my purpose. grep -Pzo '@MOLECULE(?:.\n)?\K1532\n' filename – Chetan Munegowda Jul 06 '17 at 14:47
  • @ChetanMunegowda If this answer solved your issue, please take a moment and accept it by clicking on the check mark to the left. That will mark the question as answered and is the way thanks are expressed on the Stack Exchange sites. Also, based on your input file, you might be interested in our new [bioinformatics.se] site! – terdon Jul 06 '17 at 15:00