15

How would you perform a grep for text that appears on two lines?

For example:

pbsnodes is a command I use that returns the utilization of a linux cluster

root$ pbsnodes
node1
    state = free
    procs = 2
    bar = foobar

node2
    state = free
    procs = 4
    bar = foobar

node3
    state = busy
    procs = 8
    bar = foobar

I want to determine the number of procs that match nodes that are in state 'free'. So far I have been able to determine the "number of procs" and "the nodes in free state", but I want to combine them into one command that shows all free procs.

In the above example, the correct answer would be 6 (2+4).

What I have

root$ NUMBEROFNODES=`pbsnodes|grep 'state = free'|wc -l`
root$ echo $NUMBEROFNODES
2

root$ NUMBEROFPROCS=`pbsnodes |grep "procs = "|awk  '{ print $3 }' | awk '{ sum+=$1 } END { print sum }'`
root$ echo $NUMBEROFPROCS
14

How can I search for every line that reads 'procs = x', but only if the line above it reads 'state = free?

spuder
  • 18,053

8 Answers8

12

If the data is always in that format, you could simply write it:

awk -vRS= '$4 == "free" {n+=$7}; END {print n}'

(RS= means records are paragraphs).

Or:

awk -vRS= '/state *= *free/ && match($0, "procs *=") {
  n += substr($0,RSTART+RLENGTH)}; END {print n}'
5
$ pbsnodes
node1
    state = free
    procs = 2
    bar = foobar

node2
    state = free
    procs = 4
    bar = foobar

node3
    state = busy
    procs = 8
    bar = foobar
$ pbsnodes | grep -A 1 free
    state = free
    procs = 2
--
    state = free
    procs = 4
$ pbsnodes | grep -A 1 free | grep procs | awk '{print $3}'
2
4
$ pbsnodes | grep -A 1 free | grep procs | awk '{print $3}' | paste -sd+ 
2+4
$ pbsnodes | grep -A 1 free | grep procs | awk '{print $3}' | paste -sd+ | bc 
6

https://en.wikipedia.org/wiki/Pipeline_(Unix)

4

Here's one way to do it using pcregrep.

$ pbsnodes | pcregrep -Mo 'state = free\n\s*procs = \K\d+'
2
4

Example

$ pbsnodes | \
    pcregrep -Mo 'state = free\n\s*procs = \K\d+' | \
    awk '{ sum+=$1 }; END { print sum }'
6
slm
  • 369,824
3

The GNU implementation of grep comes with two arguments to also print the lines before (-B) and after (-A) a match. Snippet from the man page:

   -A NUM, --after-context=NUM
          Print NUM lines of trailing context after matching lines.  Places a line containing  a  group  separator  (--)  between  contiguous  groups  of  matches.   With  the  -o  or
          --only-matching option, this has no effect and a warning is given.

   -B NUM, --before-context=NUM
          Print  NUM  lines  of  leading  context  before  matching  lines.   Places  a  line  containing  a group separator (--) between contiguous groups of matches.  With the -o or
          --only-matching option, this has no effect and a warning is given.

So in your case, you would have to grep for state = free and also print the following line. Combining that with the snippets from your question you'll arrive at something like that:

usr@srv % pbsnodes | grep -A 1 'state = free' | grep "procs = " | awk  '{ print $3 }' | awk '{ sum+=$1 } END { print sum }'
6

and a bit shorter:

usr@srv % pbsnodes | grep -A 1 'state = free' | awk '{ sum+=$3 } END { print sum }'
6
binfalse
  • 5,528
  • 4
  • 27
  • 28
3

If you have a fixed length data (fixed length referring to the number of lines in a record), in sed you can use the N command (several times), which joins the next line to the pattern space:

sed -n '/^node/{N;N;N;s/\n */;/g;p;}'

should give you output like:

node1;state = free;procs = 2;bar = foobar
node2;state = free;procs = 4;bar = foobar
node3;state = busy;procs = 8;bar = foobar

For variable record composition (e.g. with an empty separator line), you could make use of branching commands t and b, but awk is likely to get you there in a more comfortable way.

peterph
  • 30,838
3

Your output format is primed for Perl's paragraph slurp:

pbsnodes|perl -n00le 'BEGIN{ $sum = 0 }
                 m{
                   state \s* = \s* free \s* \n 
                   procs \s* = \s* ([0-9]+)
                 }x 
                    and $sum += $1;
                 END{ print $sum }'

Note

This only works because Perl's idea of a "paragraph" is a chunk of non-blank lines separated by one or more blank lines. If you didn't have blank lines between the node sections, this wouldn't have worked.

See also

Joseph R.
  • 39,549
0

... and here is a Perl solution:

pbsnodes | perl -lne 'if (/^\S+/) { $node = $& } elsif ( /state = free/ ) { print $node }'
0

You may use the awk getline command :

$ pbsnodes | awk 'BEGIN { freeprocs = 0 } \
                  $1=="state" && $3=="free" { getline; freeprocs+=$3 } \
                  END { print freeprocs }'

From man awk :

   getline               Set $0 from next input record; set NF, NR, FNR.

   getline <file         Set $0 from next record of file; set NF.

   getline var           Set var from next input record; set NR, FNR.

   getline var <file     Set var from next record of file.

   command | getline [var]
                         Run command piping the output either into $0 or var, as above.

   command |& getline [var]
                         Run  command  as a co-process piping the output either into $0 or var, as above.  Co-processes are a
                         gawk extension.