4

I need to find a string and need to print the line above it.

case1: The same line won't have the more than one matching pattern

ie) consider a file containing

$cat > para
returns between the paragaraphs
italic or bold    
quotes by placing
italic

Here i need to find for italic, and i need to get the output as below

returns between the paragaraphs
quotes by placing

How can i get the output like this?

don_crissti
  • 82,805
jevan
  • 41
  • I've tried to format this appropriately, but can you check that I haven't changed what you meant to say? You could probably trim this down a lot too, and make the title ask your question. You can [edit] it yourself as well. – Michael Homer Feb 20 '16 at 01:52
  • Actually i need to get the similar output for a file which contains more that 1000+ lines – jevan Feb 20 '16 at 02:27
  • Can you upload a longer sample? – Quora Feans Feb 20 '16 at 02:41
  • I don't think anybody cares whether any line might contain the pattern more than once.  The more interesting question is whether the pattern can appear on consecutive lines, and, if so, what output do you want? – G-Man Says 'Reinstate Monica' Feb 20 '16 at 02:52
  • no the pattern won't occur in consecutive lines – jevan Feb 20 '16 at 03:07
  • From the sample in the question, I don't see how grep -v italic would not work. Could you come up with an example that is more complex? – Kusalananda Oct 19 '18 at 09:30

3 Answers3

4

If the pattern cannot occur on consecutive lines you can simply run

sed '$!N;/.*\n.*PATTERN.*/P;D' infile

I've explained here how the N;P;D cycle works. The difference is that here the first line in the pattern space is printed only if the second one matches, otherwise it's deleted.


If the pattern can occur on consecutive lines the above solution will print a line that matches if it's followed by another line that matches.
To ignore consecutive matches add a second condition to print the first line in the pattern space only if it doesn't match:

sed '$!N;/.*\n.*PATTERN.*/{/.*PATTERN.*\n.*/!P;};D' infile

Another way, using the hold buffer.
If you want to ignore consecutive matches:

sed '/PATTERN/!{              # if line doesn't match PATTERN
h                             # copy pattern space content over the hold buffer
d                             # delete pattern space
}
//{                           # if line matches PATTERN
x                             # exchange pattern space with hold space
//d                           # if line matches PATTERN delete it
}' infile

or, in one line

sed '/PATTERN/!{h;d;};//{x;//d;}' infile

If you don't want to ignore consecutive matches:

sed '/PATTERN/!{              # if line doesn't match PATTERN
h                             # copy pattern space content over the hold buffer
d                             # delete pattern space
}
//x                           # if line matches PATTERN exchange buffers
' infile 

or, in one line

sed '/PATTERN/!{h;d;};//x' infile

Though keep in mind the two solutions that use the hold buffer will print a leading empty line if the first line in your file matches. If that's a problem just add 1d after the first // check e.g.
sed '/PATTERN/!{h;d;};//{1d;x;//d;}' and respectively sed '/PATTERN/!{h;d;};//{1d;x;}'

don_crissti
  • 82,805
4

Never use the word "pattern" in the context of matching text as it's highly ambiguous, always at a minimum use "string"-or-"regexp" and "partial"-or-"full", whichever kind of matching you mean. See https://stackoverflow.com/q/65621325/1745001 for more information.

We can't tell from your question what type of matching you want so here are some examples, all of which produce the posted expected output from the posted sample input, and any/all of which might be completely wrong for the OPs needs:

Partial Line Regexp Matching:

$ awk '/italic/{print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

Partial Line String Matching:

$ awk 'index($0,"italic"){print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

Partial Field Regexp Matching:

$ awk '{for (i=1; i<=NF; i++) if ($i ~ /italic/) print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

Partial Field String Matching:

$ awk '{for (i=1; i<=NF; i++) if (index($i,"italic")) print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

Full Field Regexp Matching

a) Using GNU awk for word boundaries):

$ awk '/\<italic\>/{print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

b) Using any awk:

$ awk '/(^|[[:space:]])italic([[:space:]]|$)/{print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

Full Field String Matching:

a) With a loop:

$ awk '{for (i=1; i<=NF; i++) if ($i == "italic") print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

b) With no loop and a regexp assist:

$ awk 's=index($i,"italic") && (substr($0,s-1,1) ~ /^|[[:space:]]/) && (substr($0,s+length("italic"),1) ~ /[[:space:]]|$/){print p} {p=$0}' file
returns between the paragaraphs
quotes by placing

All of the above obviously produce the expected output from your posted sample input and all of them would fail given different input depending on your requirements for string vs regexp and full vs partial matching.

Ed Morton
  • 31,617
3

Using grep, then sed:

grep --no-group-separator -B 1 "italic" <yourfilename> | sed -n 1~2p`

Explanation:

grep manual:

-B num
--before-context=num
Print num lines of leading context before matching lines

--no-group-separator When -A, -B or -C are in use, do not print a separator between groups of lines.

sed:

Pick the first line of two. We could also put sed -n 1~5p for picking the first of five.

zx8754
  • 109
Quora Feans
  • 3,866
  • You should probably use grep -v '^--$', to avoid clobbering lines from the input that contain --.  Even that will fail if a line from the input *is* --, with nothing else. – G-Man Says 'Reinstate Monica' Feb 20 '16 at 02:56
  • i didn't get you man – jevan Feb 20 '16 at 02:59
  • @don_crissti: good tip. Answer updated. – Quora Feans Feb 20 '16 at 03:06
  • @QuoraFeans - keep in mind your answer will work only if the first line in the file doesn't match otherwise the sed part will do the opposite of what's supposed to do (it will print the lines that match instead of discarding them) – don_crissti Feb 20 '16 at 10:32
  • .. | grep -v italic ? When the first line has italic ( two lines right after each other) you have a special case,,, – Walter A Feb 27 '16 at 23:10
  • @WalterA: according to the OP " the pattern won't occur in consecutive lines" – Quora Feans Feb 28 '16 at 01:04
  • @QuoraFeans then grep -v italic is an alternative, nothing wrong with the short sed command. – Walter A Feb 28 '16 at 09:08
  • @WalterA: that will only work if there 2 lines, one with the pattern and the other without. But if you have 5 lines (pattern 1st, target 5th) then you get too many not wanted lines. – Quora Feans Feb 28 '16 at 15:04
  • @QuoraFeans: My first comment started with .. |, of course I wanted to use the -B1 first: grep --no-group-separator -B 1 "italic" <yourfilename> | grep -v italic. As said, nothing wrong with sed -n 1~2p. – Walter A Feb 28 '16 at 15:09
  • @WalterA: OK, that will work in this concrete case, two lines, one is filtered out. – Quora Feans Feb 28 '16 at 16:03