Find the exact match of the ID which is always located one line below pattern through grep

Question

I would like to know the code for grep to check for two matches one after the other. For example I have the following text from one of the search files:

@<TRIPOS>MOLECULE   ← pattern
1532                ← ID
17 17
SMALL
NO_CHARGES

I need to find the exact match of the ID which is always located one line below pattern and then retrieve the file name from which ID was located.

I used the following command:

grep -Pzo '@<TRIPOS>MOLECULE'(?:.*\n)*?\K1532' filename

but I got files containing 1532 as well as for 153284. I need code for exact ID match.

Also related: Multiline pattern match using sed, awk or grep — Scott - Слава Україні, Jul 05 '17 at 19:02
I find this question confusing. The first sentence of the question body seems to say it correctly and succinctly — you want to check whether the multi-line pattern string₁\n string₂\n appears in your data. Calling string₁ a “pattern” and string₂ an “ID” just confuses matters. And you say, “I need to … retrieve the file name from which ID was located.” but then you show a command that seems to be trying to output string₂. Please state what outcome you want: true or false for a given filename?  the matching filename from a list?  string₂? … (Cont’d) — Scott - Слава Україні, Jul 05 '17 at 19:02
(Cont’d) … If you want the output to be string₂, that’s a trivial complication. Since you already know what string₂ is, you can simplify this to (command-that-returns-true-or-false) && echo "string₂". — Scott - Слава Україні, Jul 05 '17 at 19:03
@Scott from a set of files that I have I need the file name from which "ID" was found. "ID" is always located below the "pattern". — Chetan Munegowda, Jul 06 '17 at 14:47

score 1 · Answer 1 · answered Jul 05 '17 at 13:53

1

Your pattern is looking for 1532 but doesn't say anything about what comes afterwards:

$ printf '1532\n15321\n1532foo\n' | grep -o '1532'
1532
1532
1532

Depending on what you want to do, you can limit your pattern to only match before a newline:

grep -Pzo '@<TRIPOS>MOLECULE(?:.*\n)*?\K1532\n' filename

Or, if there can be whitespace after the number before the end of the line:

grep -Pzo '@<TRIPOS>MOLECULE(?:.*\n)*?\K1532\s*\n' filename

Alternatively, if you can have other things on the same line, use \b to make sure the number occurs before a word boundary:

grep -Pzo '@<TRIPOS>MOLECULE(?:.*\n)*?\K1532\b' filename

answered Jul 05 '17 at 13:53

terdon

242,166

Thank you so much for the below code. It served my purpose. grep -Pzo '@MOLECULE(?:.\n)?\K1532\n' filename – Chetan Munegowda Jul 06 '17 at 14:47
@ChetanMunegowda If this answer solved your issue, please take a moment and accept it by clicking on the check mark to the left. That will mark the question as answered and is the way thanks are expressed on the Stack Exchange sites. Also, based on your input file, you might be interested in our new [bioinformatics.se] site! – terdon Jul 06 '17 at 15:00

Find the exact match of the ID which is always located one line below pattern through grep

1 Answers1